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Abstract 

This paper studies fingerprinting games in which the number of coUuders and the collusion 
channel are unknown. The fingerprints are embedded into host sequences representing signals 
to be protected and provide the receiver with the capability to trace back pirated copies to the 
coUuders. The colluders and the fingerprint embedder are subject to signal fidelity constraints. 
Our problem setup unifies the signal-distortion and Boneh-Shaw formulations of fingerprint- 
ing. The fundamental tradeoffs between fingerprint codelength, number of users, and fidelity 
constraints are then determined. 

Several bounds on fingerprinting capacity have been presented in recent literature. This 
paper derives exact capacity formulas and presents a new randomized fingerprinting scheme 
with the following properties: (1) the encoder and receiver do not need to know the coalition 
size and collusion channel; (2) a tunable parameter A trades off false-positive and false-negative 
error exponents; (3) the receiver provides a reliability metric for its decision; and (4) the scheme 
is capacity-achieving when the false-positive exponent A tends to zero and the coalition size is 
known to the encoder. 

A fundamental component of the new scheme is the use of a "time-sharing" randomized se- 
quence. The decoder is a maximum penalized mutual information decoder^ where the significance 
of each candidate coalition is assessed relative to a threshold, and the penalty is proportional 
to the coalition size. A much simpler threshold decoder that satisfies properties (1) — (3) above 
but not (4) is also given. 
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decoder, channel coding with side information, capacity, strong converse, error exponents, multiple 
access channels, model order selection. 
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1 Introduction 



Content fingerprinting (a.k.a. digital fingerprinting, or traitor tracing) is essentially a multiuser 
version of watermarking. A covertext — such as image, video, audio, text, or software — is to be 
distributed to many users. Prior to distribution, each user is assigned a fingerprint that is embedded 

into the covertext. In a collusion attack, a coalition of users combine their marked copies, creating 
a pirated copy that contains only weak traces of their fingerprints. The pirated copy is subject 
to a fidelity requirement relative to the coalition's copies. The fidelity requirement may take the 
form of a distortion constraint, which is a natural model for media fingerprinting applications [1-7] ; 
or it may take the form of Bonch and Shaw's marking assumption, which is a popular model for 
software fingerprinting [8-10]. To trace the forgery back to the coalition members, one needs a 
fingerprinting scheme that can reliably identify the coUudcrs' fingerprints from the pirated copy. 

The fingerprinting problem presents two key challenges. 

1. The number of colluders may be large, which makes it easier for the colludcrs to mount a 
strong attack. The difficulty of the decoding problem is compounded by the fact that the 
number of colluders and the collusion channel are unknown to the encoder and decoder. 

2. There are two fundamental types of error events, namely false positives, by which innocent 
users are wrongly accused, and false negatives, by which one or more colluders escape detec- 
tion. For legal reasons, a maximum admissible value for the false-positive error probability 
should be specified. 

This paper proposes a mathematical model that satisfies these requirements and derives the corre- 
sponding information-theoretic performance limits. Prior art on related formulations of the finger- 
printing problem is reviewed below. 

The basic performance metric is capacity, which is defined with respect to a class of collusion 
channels. A multiuser data hiding problem was analyzed by Moulin and O'Sullivan [3, Sec. 8], and 
capacity expressions were obtained assuming a compound class of memorylcss channels, expected- 
distortion constraints for the distributor and the coalition, and noncoopcrating, single-user de- 
coders. Despite clear mathematical similarities, this setup is quite different from the one adopted 
in more recent fingerprinting papers. Somekh-Baruch and Merhav [4, 5] studied a fingerprinting 
problem with a known number of colluders and explored connections with the problem of coding for 
the multiple-access channel (MAC). The notion of false positives does not appear in their problem 
formulation. Lower bounds on capacity were obtained assuming almost-sure distortion constraints 
between the pirated copy and one [4] or all [5] of the coalition's copies. The lower bounds on 
capacity correspond to a restrictive encoding strategy, namely random constant-composition codes 
without time-sharing. 

Other bounds on capacity and connections between MACs and fingerprinting under the Boneh- 
Shaw assumption have been recently studied by Anthapadmanabhan et al. [10]. The covertext is 
degenerate, and side information does not appear in the information-theoretic formulation of this 
problem. 

In order to cope with unknown collusion channels and unknown number of colluders, a special 
kind of universal decoder should be designed, where universality holds not only with respect to 
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some set of channels, but also with respect to an unknown number of inputs. A tunable parameter 
should trade off the two fundamental types of error probability. When the number of colluders is 
unknown, two extreme instances of this tradeoff are to accuse all users or none of them. 

While fingerprinting capacity is a fundamental measure of the ability of any scheme to resist 
colluders, it only guarantees that the error probabilities vanish if the codes are "long enough". 
Error exponents provide a finer description of system performance. They provide estimates of the 
necessary length of a fingerprinting code that can withstand a specified number of colluders, given 
target false-positive and false-negative error probabilities. This is especially valuable in any legal 
system where the reliability of accusations should be assessed. 

Besides capacity and error-exponent formulas, the information-theoretic analysis sheds light 
about the structure of optimal codes. Particularly relevant in this respect is a random coding 
scheme by Tardos [9], which uses an auxiliary random sequence for encoding fingerprints. While his 
scheme is presented at an algorithmic level (and no optimization was involved in its construction), 
in our game-theoretic setting the auxiliary random variable appears fundamentally as part of a 
randomized strategy in an information-theoretic game whose payoff function is nonconcave with 
respect to the maximizing variable (the fingerprint distribution). 

Another issue that can be resolved in our game-theoretic setting is the optimality of coalition 
strategies that are invariant to permutations of the colluders. While one may heuristically expect 
that such strategies are optimal, a proof of this property is established in this paper. The ap- 
proach used in previous papers was to assume that coalitions employ such strategies, but often no 
performance guarantee is given if the colluders employ asymmetric strategies. 

Finally, in [9] and in the signal processing literature, several simple algorithms have been pro- 
posed to detect colluders, involving computing some correlation score between pirated copy and 
users' fingerprints, and setting up a detection threshold. We study the limits of such strategies and 
compare them with joint decoding strategies. 

1.1 Organization of This Paper 

As indicated by the bibliographic references, probabilistic analyses of digital fingerprinting have 
been reported both in the information theory literature and in the theoretical computer science 
literature. While the results derived in this paper are put in the context of related information- 
theoretic work, especially multiple-access channels, this paper is nevertheless intended to be ac- 
cessible to a broader community of readers that are trained in probability theory and statistics. 
The main tools used in our derivations are the method of types [11, 12] for analyzing random- 
coding schemes, Fano's lemma for deriving upper bounds on capacity, and elementary properties 
of information-theoretic functionals. 

A mathematical statement of our generic fingerprinting problem is given in Sec. [21 together 
with the definitions of codes, collusion channels, error probabilities, capacity, and error exponents. 
Our first main results are two fingerprinting capacity theorems and are stated in Sec. El 

The next two sections present the new random coding scheme and the resulting error exponents. 
Sec. m presents a simple but suboptimal decoder that compares empirical mutual information scores 
between received data and individual fingerprints, and outputs a guilty decision whenever the score 
exceeds a certain tunable threshold. This suboptimal decoder is closely related to strategies used in 
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the signal processing literature and in [9]. For simplicity of the exposition, the scheme and results 
are presented in the setup with degenerate side information, which is directly applicable to the 
Boneh-Shaw problem. Sec. [5] introduces and analyzes a more elaborate joint decoder that assigns 
a penalized empirical equivocation score to candidate coalitions and selects the coalition with the 
lowest score. The penalty is proportional to coalition size. The joint decoder is capacity-achieving. 

Sec. [6] outlines an extension to the problem where the collusion channel is memoryless. The 
proofs of the main results appear in Sees [THTOl and the paper concludes in Sec. [TTJ 

1.2 Notation 

We use uppercase letters for random variables, lowercase letters for their individual values, calli- 
graphic letters for finite alphabets, and boldface letters for sequences. Given an integer K, we use 
the special symbol K for the set {1, 2, • • • , K}. We denote by Ai* the set of sequences of arbitrary 
length (including 0) whose elements are in Ai. The probability mass function (p.m.f.) of a random 
variable X £ X is denoted by px = {pxix), x £ X}. The entropy of a random variable X is 
denoted by H{X), and the mutual information between two random variables X and Y is denoted 
by I{X; Y) = H{X) — H{X\Y). Should the dependency on the underlying p.m.f. 's be explicit, we 
write the p.m.f.'s as subscripts, e.g., Hp^{X) and Ip^py|^(X; Y). The Kullback-Leibler divergence 
between two p.m.f.'s p and q is denoted by D{p\\q), and the conditional Kullback-Leibler divergence 
oipY\x and qY\x given is denoted by D{pY\x\\qY\x\Px) = D{py\x Px\\qY\x Px)- All logarithms 
are in base 2 unless specified otherwise. 

Given a sequence x € , denote by its type, or empirical p.m.f. over X. Denote by Tx 
the type class associated with px, i-e., the set of all sequences of type px- Likewise, pxy denotes 
the joint type of a pair of sequences (x, y) G x 3^^, and T^y the associated joint type class. 
The conditional type Py|x of a pair of sequences (x,y) is defined by Pxyix,y)/Pi!.{x) for all x G X 
such that Px(a;) > 0. The conditional type class Ty^^ given x, is the set of all sequences y such 
that (x, y) G Txy. We denote by H{x.) the empirical entropy of the p.m.f. px) by H{y\x.) the 
empirical conditional entropy, and by I(x; y) the empirical mutual information for the joint p.m.f. 
Pxy Recall that the number of types and conditional types is polynomial in N and that [11] 

(Ar + i)-l-^l2^^W < |Tx| <2^^W, (1.1) 
(Ar + l)-W|yi2^^(yW < iTyixl <2^^(yW. (1.2) 

We use the calligraphic fonts J^x and to represent the set of all p.m.f.'s and all empirical 

p.m.f.'s, respectively, on the alphabet X. Likewise, ^y\x and ^j^j^ denote the set of all conditional 
p.m.f.'s and all empirical conditional p.m.f.'s on the alphabet 3^. The special symbol will be 
used to denote the feasible set of collusion channels Py\Xi,--- ,Xk that can be selected by a size-K 
coalition. 

Mathematical expectation is denoted by the symbol E. The shorthands = and qat < ^at 
denote asymptotic relations in the exponential scale, respectively limAr__»oo ji log = and 

limsup^^oo log ^ < 0. We define \t\~^ = max(t, 0), and exp2(t) = 2*. The indicator function of 
a set A is denoted by l{x G A}. The symbol ^ \ is used to denote the relative complement (or 
set-theoretic difference) of set B in set A. (Note that B is generally not a subset of A.) Finally, 
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we adopt the notational convention that the minimum of a function over an empty set is +00, and 
the maximum is 0. 



2 Problem Statement and Basic Definitions 



2.1 Overview 
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Figure 1: Model for fingerprinting game, using randomized code {fN,9N)- In the Boneh-Shaw 
setup, the host sequence S is degenerate and there is no distortion constraint (Di). The class 
Wk characterizes the fidehty constraint on the collusion channel. The encoder and decoder know 
neither K nor the collusion channel. 



Our model for digital fingerprinting is diagrammed in Fig. [H Let S, X , and 3^ be three 
finite alphabets. The covertext sequence S = (^i,--- ,Sn) £ consists of independent and 
identically distributed (i.i.d.) samples drawn from a p.m.f. psis), s G 5. A secret key V taking 
values in an alphabet Vat, whose cardinality potentially grows with A^, is shared between encoder 
and decoder, and not publicly revealed. The key y is a random variable independent of S. There 
are 2'^^ users, each of which receives a fingerprinted copy: 

= /^(S, V,m), 1 < m < 2^«, (2.3) 

where Jn : x Vat x {1, • • • , 2^^} is the encoding function, and m is the index of the 

user. The fidelity requirement between S and Xm is expressed via a distortion constraint. Let 
d : S X X ^ M"*" be the distortion measure and (i^(s, x) = YliLi d{si,Xi) the extension of this 
measure to length- A^ sequences. The code fj\i is subject to the distortion constraint 

d'^{s,Xm) <Di, 1 < m < 2^^. (2.4) 



Let /C = {mi, 1712,- ■■ , rnx} be a coalition of K users; no constraints are imposed on the 
formation of coalitions. The coalition uses its copies Xa: = {Xm, tu G /C} to produce a pirated 
copy Y S 3^-'^ . Without loss of generality, we assume that Y is generated stochastically according 
to a conditional p.m.f. Py\x.x: called the collusion channel. This includes deterministic mappings 
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as a special case. A fidelity constraint is imposed on Pyixk to ensure that Y is "close" to the 
fingerprinted copies X^, m G IC. This constraint may take the form of a distortion constraint 
(analogously to (j2.4p ). or alternatively, a constraint that will be referred to as the Boneh-Shaw 
constraint. The formulation of these constraints is detailed below and results in the definition of a 
feasible set #e'(Pxk;) ^^i the conditional type Py|xK- 

The encoder and decoder know neither K nor Py|Xk: selected by the K colluders0. The decoder 
has access to the pirated copy Y, the host S, and the secret key V. It produces an estimate 

}C = gN{Y,S,V) (2.5) 

of the coalition. Success can be defined as catching one colluder or catching all colluders, the latter 
task being seemingly much more difficult. An admissible decoder output is the empty set, K, = 9, 
reflecting the possibility that the signal submitted to the decoder is unrelated to the fingerprints. 
If this possibility was not allowed, an innocent user would be accused. Another good reason to 
allow AC = is simply that reliable detection is impossible when there are too many colluders, and 
the constraint on the probability of false positives would be violated if /C = was not an option. 



2.2 Randomized Fingerprinting Codes 

The formal definition of a fingerprinting code is as follows. 

Definition 2.1 A randomized rate-i? length- fingerprinting code {fN,gN) with embedding 
distortion Di is a pair of encoder mapping fN ■■ 5^ X Vtv X {1,2, • • • , [2^^]} ^ and decoder 
mapping gN : x 5^ x Vat ^ {1, 2, • • • , [2^^]}^ 

Many kinds of randomization are possible; in the most general setting, the key space Vn can 
grow superexponentially with N . For fingerprinting, three kinds of randomization seem to be 
fundamental, each serving a different purpose. All three kinds can be combined. The first one is 
randomized permutation of the letters {1, 2, • • • , to cope with channels with arbitrary memory, 
similarly to [13]. 

Definition 2.2 A randomly modulated (RM) fingerprinting code is a randomized fingerprinting 
code defined via permutations of a prototype {fN,gN)- The code is of the form 

Xm = /]^(s,'u;,m) = -K~^fN{-n:s,w,m) 

gjf{y,s,w) = gN{TTy,ns,w) (2.6) 

where it is chosen uniformly from the set of all N\ permutations of the letters {1, 2, • • • , N} and 
is not revealed publicly. The sequence vtXjt^ is obtained by applying vr to the elements ofxm- 

The 

secret key is V = (7r,VF), where W is independent of it. 

The second kind of randomization is uniform permutations of the 2^^ fingerprint assignments, 
to equalize error probabilities over all possible coalitions [7,10]. 

^ However, in order for our random coding scheme of Sec. [5] to be capacity-achieving, the encoder needs to know 

K. 
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Definition 2.3 A randomly permuted (RP) fingerprinting code is a randomized fingerprinting 
code defined via permutations of a prototype (/atj^at). The code is of the form 

Xm = /Ar(s, = /Ar(s,'u;,7r"^(m)) 

gjf{y,s,w) = 7r(^Ar(y,s,u;)) (2.7) 

where vr is chosen uniformly from the set of all 2^^l permutations of the user indices {1, 2, • • • , 2^^} 
and is not revealed publicly. The secret key is V = (vr, W), where W is independent of tt. In ^2. 7\ l, 
we have used the shorthand 7r(/C) = {7r(m), m G /C}. 

The third kind of randomization arises via an auxihary "time-sharing" random sequence. This 
strategy was not used in [4,5,10] but a nice example was developed by Tardos [9]. For binary 
alphabets S, X, and 3^, i.i.d. random variables Wi G (0,1), 1 < i < iV, are generated, and next 
the fingerprint letters Xj(m) are generated as independent Bernoulli (Wj) random variables. Here 
V = {Wi, 1 < "i < A^} is the secret key shared by encoder and decoder. 

Given an embedding distortion Di and a size-iT coalition using collusion channel in class Wk, 
there corresponds a capacity C{Di,Wk) which is the supremum over {fN,gN) of all achievable R, 
under a prescribed error criterion. 

2.3 Collusion Channels 

First we define some basic terminology for MACs with K inputs, common input alphabet X, and 
output alphabet 3^. Recall that K = {1, 2, • • • , K} and let Xk = {Xi, • • • , Xk}- Given a conditional 
p.m.f. Pyijck' consider the permuted conditional p.m.f. 

Py|x^{K)(yki'--- ,xk) = PY\Xi,{y\xTr{i), - ■ ■ ,x^(K)) (2-8) 
where vr is any permutation of the K inputs. We say that Py|XK is permutation-invariant if 

Py|x,(K) =PmK, Vvr. 
A subset Wk of =^y|XK is said to be permutation-invariant if 

In general, not all elements of Wk are permutation-invariant. The subset of Wk that consists of 
permutation-invariant conditional p.m.f. 's will be denoted by 

= [PY\X^ e Wk : PY\x^,,, = Py\x,. Vtt} . (2.9) 

Finally, if Wk is convex, the permutation-averaged conditional p.m.f. Yl-KPY\x^(y^) is also in Wk 
and is permutation-invariant by construction. 

In the fingerprinting problem, the conditional type Pyix^ ^ ^^y\Xk. ^ random variable whose 
conditional distribution given xyc depends on the collusion channel Py|Xk;- fidelity constraint 
on the coalition is of the general form 

Pr\p^\-.^ G ^i^(PxJ] = 1, (2.10) 
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where for each Px^i ^k{P:x.k:) is a convex, permutation-invariant subset of ^y|x^. That is, the 
empirical conditional p.m.f. of the pirated copy given the marked copies is restricted. The choice 
of the feasible set Wk depends on the application, as elaborated below. The explicit dependency of 
Wk on Pxk: will sometimes be omitted to simplify notation. Note that assuming Wk is permutation- 
invariant does not imply that Py\:x_^ actually selected by the coalition is permutation-invariant. 

The model ()2.10p can be used to impose hard distortion constraints on the coalition or to enforce 
the Boneh-Shaw marking assumption when X = y. 



1. Distortion Constraints. Consider the following variation on the constraints used in [3-5]. 
Define a permutation-invariant estimator / : S which produces an estimate S = 

f{Xfc) of the host signal sample based on the corresponding marked samples. El The estimator 
could be, e.g., a maximum-likelihood estimator. Then 



\xK.)d2{f{xK.),y) < D2 



(2.11) 



where d2 ■ S xy ^ IR+ is the coalition's distortion function, and D2 is the maximum allowed 
distortion. The constraint (|2.10p may be equivalently written as 



Pr 



1 ^ 

<(/(x^),y) = - Y^d2{f{xjc,t),yt) < D2 
t=i 



(2.12) 



2. Interleaving Attack. Here each colluder contributes N/K samples to the forgery - taken 
at arbitrary positions. The class Wk is a singleton: 

3. Boneh-Shaw Marking Assumption. Assume X = y and Wk is the set of conditional 
p.m.f.'s that satisfy 

xi = --- = xk =^ y = xi. (2-14) 

Then the constraint (I2.10p enforces the Boneh-Shaw marking assumption: the coUuders are 
not allowed to modify their samples at any location where these samples agree. Thus yt = 
Xmi,t at any position 1 < t < N such that Xmi,t = ••• = XmK,t- Note that Wk does not 
depend on and that the interleaving attack (j2.13p satisfies the Boneh-Shaw condition. 



2.4 Strongly Exchangeable Collusion Channels 

Recall the definition of RM codes in (j2.6p : a dual notion applies to collusion channels. For any py|Xk; 

and permutation vr of {1, 2, •• • ,A^}, define the permuted channel Py|x,c^^I'^''^'' — Py|Xk: ('^yK^K:)- 
Then we have 

^ A permutation- invariant estimator depends on the samples {Xk , k £ /C} only via their empirical distribution on 

X. 
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Definition 2.4 [4] A strongly exchangeable collusion channel Py|Xk ^-^ ^'^^ such that 
PY|XK*!y|-^'c) independent ofn, for every (x/c,y). 

A strongly exchangeable collusion channel is defined by a probability assignment Pr[Ty^^^] on 
the conditional type classes. The distribution of Y conditioned on Y G ?y|xK; is uniform: 

PY|x^(y|x^)= ' ^' VyGTyi,^. (2.15) 

In Sec. 12.61 we show that for RM codes (/atj^at), it is sufficient to consider strongly exchange- 
able collusion channels to derive worst-case error probabilities. Moreover, in the error probability 
calculations for random codes it will be sufficient to use the trivial upper bound 

PrfTyi^J < G ^i^(PxJ}. (2.16) 

2.5 Fair Coalitions 

Two notions of fairness for coalitions will be useful. Denote by vr a permutation of {1, 2, • • • , K}. 
Definition 2.5 The collusion channel Py|Xk ^-^ permutation-invariant if 

PY|XK:(y|Xmi,--- =PY|XK;(y|x7r{mi),--- ,X^(„^)), VvT. (2.17) 



For instance, if X = y and K = 2, the collusion channel 

PY|XiX2(y|xi,X2) = i[l{y = xi} + l{y = X2}] (2.18) 

is permutation-invariant. Given xi,X2, there are two equally likely choices for the pirated copy, 
namely y = xi and y = X2. Note that one colluder carries full risk and the other one zero risk. A 
stronger definition of fairness (which will not be needed in this paper) would require some kind of 
ergodic behavior of the inputs and output of the collusion channel. 

Definition 2.6 The collusion channel pY\:x.fc ^-^ first-order fair if Pr[py^-^^ G ^x^^^iPxjc)] — 1- 

For any first-order fair collusion channel, the conditional type Py\xf^ is invariant to permutations of 
the colluders, with probability 1. For instance, if X = y and K = 2, any collusion channel Py|Xjc 
resulting in the conditional type Py|xix2 ^2) = | = xi} + l{y = X2}] is first-order fair. 
This is an interleaving attack in which each colluder contributes exactly A^/2 samples (in whichever 
order) to the pirated copy. 

A first-order fair collusion channel is not necessarily permutation-invariant, and vice-versa. Fur- 
ther, if a collusion channel is first-order fair and strongly exchangeable, then it is also permutation- 
invariant. However the converse is not true. For instance the collusion chanel of (j2.18p is 
permutation-invariant and strongly exchangeable but not first-order fair because the conditional 
type Py\xic(.y\^i^ ^2) is given by either l{y = xi} or l{y = X2}, neither of which is permutation- 
invariant. 
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2.6 Error Probabilities 

Let /C be the coalition and K, = qnO^ ■, S, the decoder's output. There are several error proba- 
bilities of interest: the probability of false positives (one or more innocent users are accused): 

PMfN,gN,PY\^^) = Pr[t\lC^%l (2.19) 

the probability of missed detection for a specific coalition member m £ IC: 

Pe,m{fN,9N,PY\XK.) = Pr[m ^ )C], 

the probability of failing to catch a single colluder: 

PrifN,gN,PY\^^) = Pr[JC n /C = 0], (2.20) 
and the probability of failing to catch the full coalition: 

Pf\fN,9N,PY\x^) = Pr[JC ^ IC]. (2.21) 

The error criteria (|2.2U|) and (|2.21|) will be referred to as the detect-one and detect-all criteria, 
respectively. 

The above error probabilities may be written in the explicit form 
PeifN,9N,PY\x^) = Yl Pviv)Psi^) ( H ^^^^ = fN{s,v,m)}]pY\x^{yMH£} (2.22) 

where the error event £ is given by £pp = {gN{y, s, w) \ /C 7^ 0}, or £°'^^ = {gNiy, s, ?;) n /C = 0}, or 
fall = {/C 2 gN{y, s, v)}, when is given by (piT9]) . (p?^ . and I^Mi), respectively. The worst-case 
probability is given by 

Pe{fN,gN,^K) = max Pe{fN,gN,PY\y.K:) 

where the maximum is over all feasible collusion channels, i.e., such that (j2.10p holds. 

Maximum vs average error probability. The error probabilities (j2.19p — (j2.2ip generally 
depend on /C. Prop. 12.11 below states that (a) in order to make them independent of /C and 
provide guarantees on error probability for any coalition, one may use RP codes, and (b) random 
permutations of fingerprint assignments cannot increase the average error probability of any code. 
Let fN,gN be an arbitrary code and {fN,gN) the RP code of (j2.7p . obtained using fN,gN as a 
prototype. Let Py|Xjc be an arbitrary collusion channel when coalition fC is in effect. Given any 
other coalition IC' = 7r(/C) of the same size, let Py|x^/ be the corresponding collusion channel, 
obtained by applying (12. Sp . where vr is now a permutation of {1, • • • , 2^^}. 

Proposition 2.1 For any code JntEIn o-nd collusion channel py\Xx. ' '^^ have 

V/C' : Pe(/Af,5Af,PY|XK,) = -Pe(/Af,5Af,PY|XK) < maxPe(/Ar,5Ar,PY|X^,) (2-23) 

where {fN,gN) is the RP code of \2. 7\ ), and Pe denotes any of the error probability criteria i2.19\) . 
(KM) , and / TOT]) . 
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Proof. First consider the detect-one error criterion of (j2.20p : an error arises if g^CY, S,V)nlC = 0. 
Given a prototype fingerprinting code (/at, Sat), the detect-one error probabihty when coahtion /C 
is in effect is given by 

PrifN,9N,PY\^^) = Pr[5]^(Y,S,T^)n/C = 0] 

= Pr[^(gjv(Y,S,W^))n/C = 0] 

= Pr[~gN{Y,S,W)n7T-\lC)=$] 

= ^Y,s,W^Y.^{9N{Y,S,W)n7r-HlC) = (!)} (2.24) 

TT 

^ V ' 

independent of K, 

which is independent of /C, by virtue of the uniform distribution on vr. The derivation for the 
detect-all and the false-positive error probabilities is analogous to ()2.24p . This establishes the first 
equality in (j2.23p . The inequality is proved similarly. □ 

False Positives vs False Negatives. The tradeoff between false positives and false negatives 
is central to statistical detection theory (the Neyman- Pearson problem) and list decoding [14]. 
Note that in the classical formulation of list decoding [15, p. 166], an error is declared only if 
the message sent does not appear on the decoder's output list. The false-negative error exponent 
increases with list size and approaches the sphere packing exponent if the list size is allowed to grow 
sub exponentially with A^. This classical formulation does not include a cost for "false positives". 



2.7 Strongly Exchangeable Collusion Channels 

Prop. [22] below states that randomly modulated codes (Def. [212]) and strongly exchangeable chan- 
nels (Def. 12. 4p satisfy a certain equilibrium property: neither the fingerprint embedder nor the 
coalition has interest in deviating from those strategies. Let fN,gN be an arbitrary code and 
(/Afi<7Af) the RM code of (j2.6p . obtained using fN-,9N as a prototype. Given any feasible collusion 
channel Pyixkj denote by 

pYlXKlylx'c) = ^ J]^Y|XK;(^ykx/c) (2.25) 

the permutation-averaged channel, which is feasible and strongly exchangeable. 
Proposition 2.2 For any code fN,9N <ind collusion channel pY\Xfc> '"^^ have 

PeifN,9N,PY\JiK.) = Pe{fN,9N,PY\l^K) 

= -Pe(/7V,5Af,PY|XK) < maxPe(/Ar,5Af,PY|XK:) (2-26) 

where {fN,9N) is the RM code of i2.b]) and Pe denotes any of the error probability criteria h2. 1 y\) . 
[KM) , and (EUP. 

Proof. First consider the detect-one error criterion of (|2.2Up : an error arises if 5Ar(Y, S, y)n/C = 
0. For any fixed /C, the detect-one error probability is an average over all possible permutations tt 
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and the other random variables S, Y: 

Pr{fN,gN,PY\^^) 

(a) 1 



^ X] Pwiw)ps (s) ( Yl H-^^m = fNiTTS,W,m)} 1 PY|XK(y|x^) 

TT w,s,xic,y \meJC / 



m 

xl{5^(7ry,7rs,u;) n/C = 0} 

(6) 1 



IT w,s,xic,y \melC / 



xl{57v(y,s,u;)n/C = 0} 



xl{5Af(y,s,«;)n/C = 0} 

X] Pvt^(^)P5 (s) ( n ^■L^™ = -^^(^'^'"^^n ^Yix^lylx^c) i{57v(y,s,'i«) n/c = 0} 

xl{57v(y,s,ti;)n/C = 0} 
= P°"'^(/^,5iV,PY|xJ (2.27) 

where (a) holds by definition of the RM code, (b) is obtained by applying the change of variables 
z <— TTz to the sequences s,x/c,y, and (c) the fact that p^{s) = p^(7rs). The derivation for the 
detect-all and the false-positive error probabilities is analogous to (j2.27p . This establishes the first 
equality in (j2.26|) . The second equality and the inequality are proved similarly. □ 

2.8 Risk for Fair Coalitions 

The maximum and the minimum of the error probabilities Pe,m(A^), m E /C, will be useful. The 
maximum value, 

Pe{fN,gN,PY\:S.K.) = m|'X-Pe,m(/Af,9Af,PY|XK;)' (2-28) 

is the conventional error criterion for information transmission. However, the minimum value, 

Pe{fN,gN,PY\lL:c) = mi"; -Pe,m(/Af, Py|Xk )) (2-29) 

is more relevant to the coalition because it represents the risk of their most vulnerable member. 
Note that 

PT''UN,gN,PY\y.K) < Pe{fN,gN,PY\:>Lfc) ^ Pe{fN,gN,PY\:>L,c) ^ Pf\fN,gN,PY\y.K:)- 

While it is conceivable that some colluders could be tricked or coerced into taking a higher risk 
than others, such strategy is not secure because the whole coalition would be at risk if some of 
its members, especially the vulnerable ones, are caught. The proof of the following proposition is 
elementary. 
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Proposition 2.3 For randomly permuted codes (Def. \2.3\) . if the collusion channel Py\Xic 
permutation-invariant, then all colluders incur the same risk: 

Pe{fN,gN,PY\J^f^) = Pe{fN,gN,PY\JLK.)- 



The proof of the following proposition is omitted because it is similar to that of Prop. I2.2[ 

Proposition 2.4 For randomly permuted codes, the maximum of the error probability criteria 
\2.19\) . 112. 20\) . and ^2.21\) is achieved by a permutation-invariant collusion channel of the form 



2.9 Capacity 

Having defined the detect-one and detect-all error criteria and feasible classes of codes and collusion 
channels, we may now define the corresponding notions of fingerprinting capacity. 

Definition 2.7 A rate R is achievable for embedding distortion Di, collusion class Wk, and 
detect-one criterion if there exists a sequence of{N, [2^^]) randomized codes {fN,gN) with max- 
imum embedding distortion Di, such that both P°^^{fN,gN,'^K) and PppifN, 9Ni^k) vanish as 
N ^ oo. 

Definition 2.8 A rate R is achievable for embedding distortion Di, collusion class Wk, and 
detect-all criterion if there exists a sequence of {N, [2^^]) randomized codes {fN,gN) with max- 
imum embedding distortion Di, such that both Pf^{fN, g^j'^K) and PFp{fN,gN,^K) vanish as 
N ^ oo. 

Definition 2.9 Fingerprinting capacities C°'^^^{Di,Wk) and C^^^{Di,Wk) are the suprema of all 
achievable rates with respect to the detect-one and detect-all criteria, respectively. 

We have C^^^{Di,Wk) < C°"^^{Di,Wk) because an error event for the detect-one problem is 
also an error event for the detect-all problem. 



2.10 Random-Coding Exponents 

For a sequence of randomized codes (/at, Sat), the error exponents are defined as 



E{R,Di,Wk) =liminf 



^ log Pe{fN,gN,^K) 



where E represents the random coding exponent Epp, E"^^ , or i?^". Moreover, E^^^{R, Di,Wk) < 
E°'^^{R, Di,Wk) because an error event for the detect-one problem is also an error event for the 
detect-all problem. We have E'^^^ = if the class Wk includes channels in which one colluder can 
"stay out," i.e., not contribute to the pirated copy. 
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Fig. [2] gives a preview of E°'^^ and Epp for our random coding scheme, viewed as a function 
of the number K of colluders. The false-positive exponent -Epp is equal to A, for any value of K. 
The false-negative exponent E'^'^^ decreases with K, up to some maximum value Kji^/^ where it 
becomes zero. The decoder outputs K, = % with high probability, and therefore reliable decoding 
of any colluder is impossible, for any K > Kp^^^. 

Fig. [3] illustrates the maximum rate R{K, A) that can be accommodated by the random coding 
scheme, for fixed A. This rate decreases with K and becomes zero for K > K^. If A | 0, the rate 
curve R{K,^) tends to the capacity function C{K). Note that C{K) vanishes as K ^ oo but is 
generally positive for any finite K; in this case, limA^o-^A = oo- 



A 



Error Exponents 






\ E°"<= (K, A) 















^om Kr,A 



Figure 2: False-positive and false-negative error exponents, as a function of coalition size K, for 
fixed values of R and A. 

2.11 Memoryless Collusion Channels 

As an alternative to the collusion channels subject to the hard constraint -Pr[|)y|x^ G /^(px^)] = 1) 
we may consider memoryless collusion channels: 

N 

PY|XK(y|x/c) ='[lPY\x,^{yt\xic,t) (2.30) 

t=l 

where Py\Xic ^ ^K{pyiK.)i viewed as a compound class of channels [11]. As we shall see there is 
a strong link between the two problems in the form of Lemma 13.21 which is used to establish our 
converse theorems; also see Sec. [H 
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Rate 




Figure 3: Capacity C and achievable rate R (for false-positive error exponent equal to A), as a 
function of coalition size K. 

3 Fingerprinting Capacity 

In this section we present fingerprinting capacity formulas under the detect-one and detect-all error 
criteria. To put these results in context, let us first recall related results for MACs. In the absence 
of side information, the capacity region of the MAC was determined by Ahlswede [16] and Liao [17]. 
For the MAC with common side information at the transmitter and receiver, some very general 
capacity formulas were derived by Das and Narayan [18] under the assumption that S is an ergodic 
process. In some special cases these formulas can be single- letterized. For fingerprinting with i.i.d. 
S and coalition size equal to 2, bounds on capacity were derived in [4,5]. Thus the presence of the 
side information S causes difficulties in deriving single-letter capacity formulas for both MAC and 
fingerprinting problems. 

The proof of the converse under the detect-all criterion is based on the standard Fano inequality. 
Surprisingly, Fano's inequality does not seem to be the right tool to prove the converse under the 
detect-one criterion [19]. The direction we have pursued instead is a strong converse. An initial 
attempt in this direction appeared in [10], but the resulting upper bound on capacity is loose. 
Our proof is based on explicit sphere-packing arguments, specifically the fact that typical sets 
for Y given the embedded fingerprints cannot have too much statistical overlap, otherwise reliable 
decoding is impossible. The tools used here are different from those used for classical problems such 
as the single-user discrete memoryless channel [15, pp. 173 — 176] and the MAC [20]. The use of a 
detect-one criterion requires a different machinery. Without loss of optimality it is assumed that 
randomly permuted codes (Def. 12. 3p are used, in which case average and maximal error probabilities 
over all possible coalitions coincide. The wringing techniques of [20] are unnecessary here, and a 
simpler technique is used to deal with codeword pairs whose self-information score is well above 
average. 

The following lemma will be useful throughout this paper. Its proof appears in Appendix El 
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Lemma 3.1 Let K = {1, 2, • • • , and assume the distribution of (Xk, Z) is invariant to permu- 
tations of K. Then for any nested sets A C B C K, we have 

i^i7(XA|ZXK\A) < ^^H{Xb\ZX^,\b), (3.1) 
^^H{Xj,\Z) > l-^H{Xs\Z). (3.2) 

Both inequalities hold with equality if X^, k £K, are conditionally independent given Z. 

We will derive two simple formulas by application of this lemma. First, applying (|3.1|) with 
Z = (y, S, W) and (j3.2p with Z = {S, W) and subtracting the first inequality from the second, we 
obtain 



-^I{Xj,;YX^\j,\SW) > ^IiXB;YX^,\B\SW), VA C B C K (3.3) 



with equality if Xk, k € K, are conditionally independent given Z. Second, for Xk, k £ K condi- 
tionally i.i.d. given {S,W), we have 

I{Xi;Y\S,W) = H{Xi\S,W) - H{Xi\Y,S,W) 

= ^H{Xi^\S,W)-H{Xi\Y,S,W) 

< ^H{Xi^\S,W)-^H{X^,\Y,S,W) 

= ^I{X^,;Y\S,W) (3.4) 

where the inequality follows from ()3.2p with Z = (Y, S, W). 

Now consider an auxiliary random variable W defined over an alphabet W = {1,2,-- - ,L}, and 
independent of S. Define the set of conditional p.m.f.'s 

^x^w\s{Ps,L,Di) 

= \px^w\s = PwYlpXk\sw ■ Px,\sw = ■■■ =PXk\sw, '^d{S,Xi) < Di\ (3.5) 

I keK J 

and the functions 

Cr''{Di,WK) = max min ^I{Xy^;Y\S,W) (3.6) 

Cf{D,,WK) = max min min ^-/(Xa; y|5, Xk\a, VF). (3.7) 

PXt^w\s<^'^Xt^w\siPs,L,Di) PYix^^ep-Kipx^^) ACK |A| 

Using the same derivation as in Lemma 2.1 of [13], it is easily shown that C£"^(Di,#x) and 
C£"(-Di, ^ii-) are nondecreasing functions of L and converge to finite limits: 

Concp^^^^^ = lim Cr''{Di,WK) (3.8) 

L^oo 

C^'\D,,Wk) = lim CfiDuWK). (3.9) 

L— >oo 
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Moreover, the gap to each hmit may be bounded by a polynomial function of L, see [13, Sec. 3.5] 
for a similar derivation. The basic idea is to discretize each >^(pxk) to a fine grid of L collusion 
channels. By application of Caratheodory's theorem, the supremum of Cl over L is achieved by 
L < \S\ \X\ + L. The gap between the minimum of the cost function over '^k{px^) and over its 
discrete approximation can be bounded by cL"!-^! ^1"^! ^ where c is a constant. 

The following lemma will be used to prove Theorems 13.41 and 13.31 below. Its proof is given in 
Appendix iBl and borrows ideas from [13, Theorem 3.7]. 

Lemma 3.2 Consider the compound family #x(Pxk) rnemoryless channels in \2. 30\) . Under 
both the detect-one and detect-all criteria, the compound capacity for this problem is an upper 
bound on the capacity for the main problem of ^2.10\) . in which Py|xK ^ ^^^(Pxk) with probability 1. 

Theorem 3.3 Fingerprinting capacity is given by C"^^^(Z)i, under the detect-all criterion. 

If the colluders select a fair collusion channel, as is their collective interest, then /^(Pxk) = 
^^''(Pxk), and 

C''^\Di,Wt") = C°''%Di, Wt"). (3.10) 
The same capacity is obtained for the compound memoryless class of 12. 30\) . 

The proof of the converse (rates higher than capacity are not achievable) is given in Sec. [71 
Achievability is proved in Theorem [Ol Also note that C'^^\Di,W^^") > C^\Di,Wk) generally. 
In fact C'^^^{Di, Wk) = if '^xiPXn) contains conditional p.m.f.'s PY\Xt^ such that Y is independent 
of one of the inputs X^, k G K. 

Theorem 3.4 Fingerprinting capacity is given by 

C°''''{Di,Wit'') = C°"^'(Di, /Ti^) (3.11) 
under the detect-one criterion. Moreover, for any R > C^"(-Di, '^k), we have 

lim min max{P°''%fN,gN,^K),PFpUN,gN,^K)} = 1- 

N-*co fN,gN 

The same results are obtained for the compound memoryless class of i2.30\) . 

The proof of the converse is given in Sec. [HI and achievability is proved in Theorem 15. 2i 
The lower bounds on fingerprinting capacity derived in [4,5] are of the form (13. 6p with L = 1, i.e., 
the auxiliary random variable W is degenerate. Since the payoff function ^p^p^ ^^py^x i-^^'^l^) 
is generally nonconcave with respect to Px\s^ ^ randomized strategy in which the variable Px\s is 
randomized will generally outperform a deterministic strategy in which px\s is fixed. The auxiliary 
random variable W plays the role of selector of Px\s this mutual-information game. 

Apparently the benefits of this randomization can be dramatic for large K. For the Boneh-Shaw 
problem, the value of the maxmin of ([32]) with L = 1 is C^^'^Di, Wk) = K~'^ 2-^^~'^\ However 
Tardos' scheme [9] uses W = [0, 1] and achieves a rate 0{K~'^) which is therefore much larger than 
C°'^°(L'i, Wk) for large K. The rate of his code is necessarily a lower bound on C°'^°{Di, Wk)- 
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4 Simple Fingerprint Decoder 



This section introduces our random coding scheme and a simple decoder that tests candidate fin- 
gerprints one by one. This decoder is closely related to the correlation decoders that have been 
used in Tardos' paper [9] and in the signal processing literature. (Such decoders evaluate a measure 
of correlation between the received sequence and the individual fingerprints, and retain the finger- 
prints whose correlation score is above a certain threshold.) We derive error exponents for this 
scheme and establish maximum rates for reliable decoding. These rates fall short of the fingerprint- 
ing capacities C"^^{Di,Wk) and C^^^{Di,Wk) given by Theorem 13.41 and 1 3.31 The derivations are 
given for the case without side information [S = 0) or distortion constraint {Di) for the fingerprint 
distributor. This setup is is directly applicable to the Boneh-Shaw model, and the derivations are 
much easier to follow. This setup also contains several key ingredients of the error analysis for the 
more elaborate joint fingerprint decoder of Sec. O In particular, the false-negative error exponents 
are is determined by the worst conditional type Tyx^|w 

4.1 Codebook 

The scheme is designed to achieve a false-positive error exponent equal to A and assumes a nominal 
value i^nom for coalition size. (Reliable decoding will generally be possible for K > Knom though.) 
These parameters are used to identify a joint type class defined below (j9.4p . An arbitrarily 
large L is selected, defining an alphabet W = {1, 2, • • • , L}. A random constant-composition code 
C(w) = {xm, 1 < m < 2^^} is generated for each w G by drawing 2^^ sequences independently 
and uniformly from the conditional type class T^^^- 

4.2 Encoding Scheme 

A sequence W is drawn uniformly from the type class and shared with the receiver. User m is 
assigned codeword from C(W), for 1 < m < 2^^. 

4.3 Decoding Scheme 

The receiver makes an innocent/guilty decision on each user independently of the other users, and 
there lies the simplicity but also the suboptimality of this decoder. Specifically, the estimated 
coalition fC is the collection of all m such that 

/(x^;y|w) >i? + A. (4.1) 

If no such fC is found, the receiver outputs IC = The users whose empirical mutual information 
score exceeds the threshold R + A are declared guilty. 

4.4 Error Exponents 

Theorem l4.1l below gives the false-positive and false-negative error exponents for this coding scheme. 
These exponents are given in terms of the functions defined below. 
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Define the set of conditional p.m.f.'s for Xk given W whose conditional marginals are the same 
for all components of Xk: 

{Px\w) = {PXkIW : PXrr^lw = Px\w, Vm G K}. 

Denote by ^xwi^) the set of p.m.f.'s pxw defined over X x W. Define for each m G K the set of 
conditional p.m.f.'s 

^YX^^\w{pxw,'^K,R,L,m) = {pyx^\w ■ Px^^iw ^ ^{Px\w), Py\x^^ ^ '^k{pxJ, 

IpYX,lwPwiXm;Y\W) < (4.2) 

and the pseudo sphere packing exponent 

Epsp,miR,L,pxw,^K) = ^ min D{pYXi,\w\\PY\Xi,Px\w\Pw)-{'i-^) 

The terminology pseudo sphere-packing exponent is used because despite its superficial resemblance 
to a sphere-packing exponent [11], (j4.3p does not provide a fundamental asymptotic lower bound 
on error probability. 

Taking the maximum and minimum of Epsp,m above over m E K, we respectively define 

Epsp{R:L,pxw,y^K) = max Epsp^miR,L, Pxw, '^k), (4.4) 
E {R,L,pxw,'^k) = min Epsp,m{R,L, Pxw, '^k)- (4.5) 

If these expressions are evaluated for the set which is permutation invariant, then (j4.2p and 

()4.3p are independent of m G K, and the expressions ()4.4p and (j4.5p coincide. Define 

Ep,piR,L,WK)= max Ep,p,^{R,L,pxw,^KZJ- (4-6) 
Denote by p*xw maximizer in (14. 6p . which depends on i? and ^Xnlm' Finally, define 

Ep,p{R,L,WK) = Ep,p{R,L,p*j,w,Wk), (4.7) 
E^,^{R,L,Wk) = E^,p{R,L,p*^^,Wk), (4.8) 

where no fairness requirement is imposed on Wk- 

Theorem 4.1 The threshold decision rule (JjJ^ yields the following error exponents. 

(i) The false-positive error exponent is 

EFpiR,L,l^K,A) = A. (4.9) 

(ii) The detect- one error exponent is 

E''^%R,L,Wk,A) = Ep,piR + A,L,Wk). (4.10) 
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(iii) The detect-all error exponent is 

E^'\R, L, Wk, A) = ^p,p(i? + A, L, Wk). (4.11) 

(iv) A fair collusion strategy is optimal under the detect-one error criterion: E"^^{R, L, W^, A) = 
E°''%R,L,W^''^,A). 

(v) The detect-one and detect- all error exponents are the same when the colluders restrict their 

choice to fair strategies: E°''''{R, L, A) = £;^"(i?, L, A). 

(vi) For K = i^nom; the supremum of all rates for which the detect-one error exponent of ^AO) 
is positive is given by 



^simple = C7^'>^Pl<'(#^ 



fair\ 



lim max min k „ iXi;Y\W) (4.12) 



and is achieved by letting A — > and L — > oo. 



Note. Applying ([O]) with 5 = 0, we have I{Xi;Y\W) < ^ I{Xi^;Y\W) for any permutation- 
invariant Py|XK- Since this inequahty is generally strict, C^^'^^^'^(Wk) is generally lower than the 
fingerprinting capacity C°^'^{Wk) of (j3.8p . Hence the simple thresholding rule ()4.ip is generally not 
capacity-achieving. 



5 Joint Fingerprint Decoder 

The encoder and joint decoder are presented in this section, and the performance of the new scheme 
is analyzed. As in the previous section, the encoder ensures a false-positive error exponent A and 
assumes a nominal value K^om for coalition size. An arbitrarily large L is selected, defining an 
alphabet W = {1, 2, • • • , L}. A random constant-composition code C(s, w) = {x^, 1 < m < 2^^} 
is generated for each s E 5'^ and w S T^ by drawing 2^^ sequences independently and uniformly 
from a conditional type class T*^^^. Both T^ and T*^^^ depend on A and Knom as defined below 
(jl0.6p . Prior to encoding, a sequence W G is drawn independently of S and uniformly from T^, 
and shared with the receiver. Next, user m is assigned codeword x^ E C(S, W), for 1 < m < 2^^. 

In terms of decoding, the fundamental improvement over the simple strategy of Sec. H] resides in 
the use of a joint decoding rule. Specifically, the decoder maximizes a penalized empirical mutual 
information score over all possible coalitions of any size. The penalty is proportional to the size of 
the coalition. 



5.1 Mutual Information of k Random Variables 

Our fingerprint decoding scheme is based on the notion of mutual information between k random 
variables Xi, • • • ^X^. For /c = 3, this mutual information is defined as [11, p. 57] [21, p. 378] 

°I{Xi-X2;X^) = H{Xi) + H{X2) + H{X^) - /7(Xi, X2, Xg). 
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We use the symbol / to distinguish it from the symbol / for standard mutual information between 
two random variables. Note the chain rule 

I{Xr,X2;X3) = I{Xr,X2X3) + IiX2;Xs). 

The mutual information between k random variables Xi, ■ ■ ■ , X^ is similarly defined as the sum 
of their individual entropies minus their joint entropy [11, p. 57] or equivalently, the divergence 
between their joint distribution and the product of their marginals: 

I{Xi;--- -Xk) = H{Xi) + --- + H{Xk)-H{Xi,-- - ,Xk) (5.1) 
= D{px,-xjpx,---px,,). 

Note the following properties, including the chain rules (P3) and (P4): 
(PI) The mutual information ()5.ip is symmetric in its arguments; 
(P2) /(Xi;X2) = /(Xi;X2); 

(P3) I{Xi; ■■■■,Xk)= I{Xi;X2 ■ ■ ■ X,,) + /(X2; • • • ; ^fc) = Eti ^.+1 • • • Xk); 

(P4) /(Xi;--- ;X,,) =/(Xi;-- - ; X,; • • • X,,) + /(X^+i; • • • ; X^) for any i G {1, 2, • • • ,k-2}; 

(P5) /(Xi;--- ■,Xk) = EiZiH{Xi)-H{Xi---Xk-i\Xk). 

o 

Similarly to (j5.ip . we define the empirical mutual information /(xi; • • • ; x^) between k sequences 
xi, • • • ,Xfc, as the mutual information with respect to the joint type of xi, • • • ,Xfc. Analogously to 
Property (P5), we have 

k 

/(xi;--- ;xfc;y) = J^/7(x,)-i/(xi---Xfe|y). (5.2) 

i=l 

This leads to the following alternative interpretation of the minimum-equivocation decoder of Liu 
and Hughes [21]. If xi, • • • ,Xfc are codewords from a constant-composition code C, then i/(xj) is 
the same for all i, then the minimum-equivocation decoder is equivalent to a maximum- mutual- 
information decoder: 

o 

min i7(xi • • -Xfcly) <^ max /(xi; • • • ; x^.; y). (5.3) 
xi---XfceC xr--XfceC 

There is no similar interpretation when ordinary mutual information /(xi • • • x^; y) is used [21]. Liu 
and Hughes showed that the minimum-equivocation decoder outperforms the ordinary maximum- 
mutual-information decoder in terms of random-coding exponent. 
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5.2 MPMI Criterion 



The restriction of x_a4 to a subset A of M. will be denoted by x_4 = {x^, m £ A}. For disjoint sets 
A = {mi, • • • , m|_4|} and B = {m\_A\j_i, • • • , "Z|_4|_,_|g|}, we use the shorthand 



-^(xy4;yxB|sw) = /(x. 



nil ' 



;xm|^i;yxB|swj 



for the mutual information between the |^| + 1 random variables x^i,' 
conditioned on (s,w). 

Define the function 



(5.4) 

Xmi_^l, and (y,XB), 



MPMI{k) 





max 

xjc6C*(s,w) 



/(x^;y|sw) - /c(i? + A) 



: a k = 
: if /c = 1,2,. 



where k = \IC\ and 



/(xK:;y|sw) = /(xi; • • • ;xfc;y|sw) = kH{x.\sw) - i?(x/c| ysw) 



(5.5) 



(5.6) 



is the mutual information between the k + 1 sequences xi,--- ,Xfc,y, conditioned on (s,w), as 

o 

defined in (j5.4p . Again we stress that /(xi; • • • ; x^; y|sw) should not be confused with the ordinary 
mutual information /(xi • • • x^; y|sw) between the fc-uple (xi, • • • , x^) and y, conditioned on (s, w). 
Our joint fingerprint decoder is a Maximum Penalized Mutual Information (MPMI) decoder: 



ma^MPMI(k). 
k>0 



(5.7) 



In case of a tie, the largest value of k is retained. The decoder seeks the coalition size k and the 
codewords {x^, tti € fC} in C(s,w) that achieve the MPMI criterion above. The indices of these 
codewords form the decoded coalition fC. If the minimizing k in ()5.7p is zero, the receiver outputs 
AC = 0. Similarly to (j5.3p . the MPMI decoder may equivalently be interpreted as a Maximum 
Penalized Equivocation criterion. 



5.3 Properties 

The following lemma shows that 1) each subset of the estimated coalition is significant, and 2) any 
extension of the estimated coalition would fail a significance test. 



Lemma 5.1 Let fC achieve the maximum in 15. 5\) ( [5. 7| ). Then 

yA<^K: : I(x^;yx^^_^|sw) > |^|(i? + A). 
Moreover, for every A disjoint with fC, 

/(x^;yx^|sw) < \A\{R + A). 



(5.5 



(5.9) 
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Proof. For any A '^^ fC, we have 

l(x^;yx^^_^|sw) - |^| {R + A) 

[/(x^; y |sw) -k{R + A)]- y |sw) - {K - \A\) {R + A)] 

MPMi{k) - y - - l-^l) + ^)] 

> MPMI{k) - MPMI{k - \A\) 

> 

o ^ 

where (a) follows from the chain rule for I, (b) holds because fC achieves the maximum in (|5.5p . 
and (c) because k achieves the maximum in ()5.7p . This proves (jS.Sp . 

To prove (j5.9p . consider any A disjoint with IC and let fC' = !C U A. We have 

/(x^;yx^|sw)- 1^1 (R + A) 

[/(xyc; y |sw) - (i? + A)] - [/(x^; y |sw) - ii- (ii + A)] 

(h) ° 

= [I{x/c' ; y |sw) - (i? + A)] - MPMI{K) 

< MPMI{K') - MPMI{k) 

(c) 

< 0, 

where (a), (b), (c) are justified in the same way as above. This proves (|5.9p . □ 
Reliability metric. The score 

I{xj^;y\sw) - kR> kA 

represents a guilt index for the estimated coalition IC. The larger this quantity is, the stronger the 
evidence that the members of IC are guilty. Likewise, 

o 

/(xm;yx^^^^^ |sw) - R> A 
is a guilt index for accused user m £ IC, and 

o 

/(xm; yx^ \sw) - R< A 

is a guilt index for user m ^ IC. The smaller this index is, the stronger the evidence that m is 
innocent. 

5.4 Error Exponents 

Theorem 1 5 . 2 1 b elow gives the false-positive and false-negative error exponents for our coding scheme. 
These exponents are given in terms of the functions defined below. 
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Recall ^XKWlsiPs^ L, Di) defined in (j3.5p . We similarly define 

^Xi,\sw{Psw,L,Di) = <Px^^\sw = YlPXk\sw ■ Pxi\sw = ■ ■ ■ = Pxk\sw, '^d{S,Xi) < dA . 

I fceK J 

Define now the following set of conditional p.m.f.'s for Xk given S, W whose conditional marginal 
p.m.f. Px\sw is the same for each Xm,m £ K: 



^x\sw) = {Pxt^isw ■ PXrr^lsw = Px\sw, Vm G K}. 
Define for each A C K the set of conditional p.m.f.'s 

^YXt^\SwiPW,PS\W,PX\SW, '^K, R, L, A) 

- {pyx^isw ■ Px^\sw e ^{px\sw), Py\Xk e ^k{pxk), 

l^IpwPs^wPyx,,swi^A;YX^\;,\S,W) < i?} (5.10) 

and the pseudo sphere packing exponent 
Epsp,AiR, L,pwiPs\w^Px\sw^ '^k) 



= - D{pYx^\swPs\w\\PY\x^Px\swPs\Pw)-{5.ll) 

Taking the maximum H and the minimum of -Epsp a above over all subsets A C K, we define 

Epsp{R,L,pw,Ps\w,Px\sw,'^K) = Epsp,k{R,L,pw,Ps\w,Px\sw,'^k), (5.12) 
KpspiR^ L,pw,Ps\w,Px\sw,'^k) = mmEpsp^A{R,L,pw,Ps\w,Px\sw,^K)- (5.13) 



Now define 



Epsp{R, L, Di,Wk) = max _ min max 

Pw&^w Ps\w&-'^s\w Px\.sw<^'^x\.sw{Pw,Ps\w<L,Di) 



Epsp,K{R^ L,pw,Ps\w,Px\sw, ^kZJ- (5-14) 

Denote by and P*x\sw maximizers in (I5.14p . where the latter is to be viewed as a function 
of ps\^r. Also note that both and P*x\sw implicitly depend on R and ^kZui' ^^t^^^^Yi define 

Epsp{R,L,Di,Wk) = _ min Epsp{R, L,p*yi,,ps\w,P*x\sw^'^K), (5.15) 

Ps\w^^s\w ' 

Epsp{R^L,Di,WK) = _ min Ep^p{R, L,py,ps\^,,p*^,g^^, Wk). (5.16) 

PS\W^i^S\W 



The property that K achieves maxACK Spsp.A is derived in the proof of Theorem 15.21 Part (iv) 
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Theorem 5.2 The decision rule |5. 7| j yields the following error exponents. 



(i) The false-positive error exponent is 



Efp{R,Di,Wk,A) = A. 



(5.17) 



(ii) The error exponent for the (false negative) probability that the decoder fails to catch all colluders 
(misses some of them) is 



(iii) The error exponent for the (false negative) probability that the decoder fails to catch even one 
colluder (misses every single colluder) is 



(iv) E°'"'{R,L,Di,Wk,A) = E°'"'{R,L,Di,W^'\A). 

(v) ^^"(i?, L, Di, W^^\ A) = E°'^«(/?, L, Di,W^^\ A). 

(vi) If K = K^ora, the supremum of all rates for which the error exponents of i5.18\) and \5.19\) 
are positive are C''^\Di,Wk) and C°'^''{Di,Wk) of / fOI) and / fOI) . respectively. 

Note. The expressions ()5.18p and (j5.19p for the false-negative error exponents may be viewed 
as sequences indexed by L. As discussed below ()3.7p and in [13, Sec. 3.5], one may show that these 
sequences are nondecreasing and converge to finite limits at a polynomial rate. 

6 Error Exponents for Memoryless Collusion Channels 

Consider the compound class (j2.30p of memoryless channels. The theorems of Sec. [3] showed that 
compound capacity is the same as for the main problem of ()2.10p . We now outline how the derivation 
of the error exponents. 

Retracing the steps of the proof of Theorem 15.21 it may be seen that the expressions (j5.17p . 
(j5.18p and (|5.19p for the error exponents remain valid, with two modifications. First, in (j5.10p . the 
constraint Py|XK ^ is removed, and so the resulting set ^Y'xi!\SW^^ larger than ^yxkISW of 
(j5.1Up . Second, the divergence cost function 



^^"(i?, L, Di,Wk, A) = Ep,JR + A,L,DuWk). 



(5.18) 



E^'^^R, L, Di,Wk,^) = Ep,p{R + A, L, Di,Wk). 



(5.19) 



D{pyxk\sw Ps\w\\Py\Xk Px\sw PsIpw) 



(6.1) 



in the expression (jS.lip for the pseudo sphere packing exponent Epsp,A is replaced by @ 



min D{pyx>,\swPs\w\\Py\x>,Px\swPs\pw); (6.2) 



* This can be traced back to (|10.15l) . where Py|x/c is now replaced with Py\x^ in the asymptotic expression for 
the probabihty of the conditional type class Tyx^lsw- 
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denote by E^J™^^^ '^^^ the corresponding pseudo sphere packing exponent. 

The divergences in (|6.ip and (j6.2p coincide when Py|XK = PY\Xt^j thus (|6.2p is upper-bounded by 

(!nrT|) . Since Py|XK = Py\Xk is feasible for ^yx^ISW of (|5.1Up . we conclude that -E'^p"^"'^^^'''''' ^ ^psp,A 
of (jS.lip . Hence the false-negative error exponents in the memoryless case are upper-bounded by 
those of Theorem 15.21 This phenomenon is similar to results in [13]: due to the use of RM codes, 
the colluders' optimal strategy is a nearly-memoryless strategy, but they are precluded from using 
a truly memoryless strategy because that would violate the hard constraint Py|xK ^ ^k- In the 
memoryless case, the worst conditional type (which determines the false-negative error exponents) 
might be such that Pyix^ ^ ^K- 



7 Proof of Converse for Theorem 13.31 (Detect-All) 

By Lemma 13. 2( it suffices to prove the claim for the compound class of memoryless channels 
^xiPxic) of (j2.30p . To simplify the presentation, the proof is first given in the special case where 
#ii:(px)c) is independent of |3xk • Let K be size of the coalition and {fN,9N) a sequence of length- A^, 
rate-i? codes. We show that for any such sequence of codes, reliable decoding of the fingerprints is 
possible only if i? < C"^^^(-Di, Wk) under the detect-all criterion. Recall that the encoder generates 
marked copies x^, = f]\f{s,v,'m) for 1 < m < 2^^ and the decoder outputs an estimated coalition 
gN{y,s,v)e {!,■■■ ,2^^^- 

Step 1. A lower bound on error probability is obtained when a helper provides some information 
to the decoder. In our derivations below, the helper informs the decoder that the coalition size 

is K. There are ( ^ 1 < 2^^^ possible coalitions of size K. We represent a coalition as 

Mk = {Ml,-- - ,Mk}, where M^, 1 < k < K, are assumed to be drawn i.i.d. uniformly □ from 
{I,-- - ,2^^}. Let Xfc = XJV4, ^ < k < K, and Xk = {Xi,--- ,Xi^}. Assuming memoryless 
collusion channel Py|XK ^ is in effect, the joint p.m.f. of (Mk, S, V, Xk, Y) is given by 

PMkSVXkY = Ps PV 

l<k<K 

Define the random variables Qt = {V, Sj,j 7^ t} € Vn x 5^"^ for 1 < t < A^. By assumption, 
St and Qt are independent, and X^t, ^ < k < K, are conditionally i.i.d. given {St,Qt) = {S,V). 
However, note that Xkt, ^ ^ k < K , are generally conditionally dependent given {St, V) alone. The 
joint p.m.f. of {St,Qt, X^^^t,Yt) is 

PStPQt n PXktlStQt Py\x^, l<t<N. 

\l<k<K J 

Now define a time-sharing random variable T, uniformly distributed over {1, - - - , A^}, and indepen- 
dent of the other random variables. Let 

Xk = A:k,tG^^^, Y^YT(^y, S = St£S, 

W = iQT,T)eW = VNxS^-^ X {!,■■■ ,N}. (7.2) 



Capacity could be higher if there were constraints on the formation of coahtions. 
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The joint p.m.f. of {S, W, Xk,Y) is 



PsPw \ n PXklsw Pr|XK- 



(7.3) 



For each A; G K, 



A<k<K 



N 



t=i 



Ed{S,Xk) 



Hence Pxt^w\s belongs to the set ^x^wlsiPs, L, Di) of (|3.5p . with L = |W| = iV x Vat x \S\^ . 

Step 2. Our single-letter expressions are derived from the following inequality, which is valid 
for ah A C K: 

/(MA;Y|S,y) < /(Xa;Y|S,F) 

= /(Xa; Y|Xk\a, S, V) + /(Xa; Xk\a|S, V) -/(Xa; Xk\aI Y, S, V) 



(b) 



< /(XA;Y|XK\A,S,y) 

i7(Y|XK\A,S,F)-i?(Y|XK,S,F) 

i7(Y|XK\A,S,F)-i/(Y|XK) 



(J 

id) 



(/) 



N 



N 



^ i/(yi|y*-i, Xk\a, s,v)-Y, H{Yt\x^, 



t=i 

N 



t=l 



N 



< j;i/(yt|XK\A,t,s,y)-^if(yt|XK, 



t=i 

N 



t=l 

N 



^ H{Yt\X^\j,^t, St, Qt) - H{Yt\XK,t, St, Qt 



t=i 

N 



t=l 



I{XA,t; Yt\Xi^\ji^^t, St, Qt 



t=i 

= iV/(XA;y|XK\A,5,W^) (7.4) 

where (a) is due to the data processing inequality, (b) holds because the codewords {X^., 1 < A; < K} 
are mutually independent given (S, V), (c) because (S, V) Xk — > Y forms a Markov chain, (d) 
is obtained using the chain rule for entropy and the fact that the collusion channel is memoryless, 
(e) holds because conditioning reduces entropy, and (f) because (S,y) = {St,Qt) ^K,t — *■ Yt 
forms a Markov chain. 

Step 3. Under collusion channel Py|XK ^ ^K, let Pe^^{PY\X)^) = Pr[JC / /C] be the decoding 
error probability of the detect- all decoder. The following inequalities hold for every subset A of K: 



|A| NR H{Ma) H{Ma\S, V) 



= F(MA|Y,S,y)+/(MA;Y|S,F) 
< ii-(MK|Y,S,y)+/(MA;Y|S,y) 



< l + Pf{py\xJ-KNR + I{Mj,;Y\S,V) (7.5) 
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where (a) holds because Ma is uniformly distributed over {!,••• ,21^1-'^^}, (b) because Ma and 
(S, V) are independent, and (c) because of Fano's inequality. 

For the error probability Pf^^{pY\Xt(,) to vanish for each Py|XK ^ we need 

i2<liminf min min /(Ma; YIS, F). (7.6) 

- Af-^oo py|^^6^K ACK A^|A| ^ ^' I ' ^ ^ ^ 



We have 



min min — ——I(Ma:Y\S,V) 

PYlX^e^K ^CK N\A\ 

< min mini-/(XA;y|XK\A,5,W^) 

Pvixu-e^jf ACK A ^ 



PY\Xf 

(b) 1 , , , 

< max min min— — I{X/\-,Y\Xu\&, S,Wj 

Px^w\s<^S>'x^w\s(PsMN),D^) PY\x^(iy^K f<Q^ |A' ^ 



< sup max min mm — I{Xj\-,Y\Xy(\/i^,S,W) 

L-^oo Px^w\s(^^Xy^w\siPS,L,Di) PY\Xy^(iK'K I^Qi^ |A| 

(c) 1 

< lim max min min — — /(Xa; a, 5*, VF) (7-7) 

Pxyw\s(^ »x^w\sis>s.L,D^) Pyi^Ke^TxACK |A| 

where (a) is due to (j7.4|) . (b) to the fact that 'PXyC^\s given in (j7.3p belongs to the set 
^js:^^|5(ps,L,L>i) defined in (l33|), with L = L{N) = N x Vn x , and (c) because the 
supremand is nondecreasing with L. 

Combining (j7.6p and (j7.7p . we obtain 

i?< lim max min mm I{X^;Y\X^^\;^, S,W), (7.8) 

PXKVK|Se^XKH'|s(PS,i,Oi) PvlXKe^ifACK |A| 

which establishes (13.91). 



Step 4. It remains to prove the claim in the case Wk depends on the joint type = Px^^ 
(this last equality is justified by (j7.2p ) which we denote by Z to make the notation more compact. 
Let pz be the p.m.f. for the random variable Z £ Z = ^^x}^ \ the cardinality of Z is at most 
(A^ + l)!"^!^. Define the following sets which form a partition of S^Xy(W\skPs^^-,D\)'- 

Mz G ^1^^ : ^XkH^is^s, ^, ^1, 2) = {pxy,w\s e =^XKiy|s(ps, ^1) : Px^ = z] . 

Since the channel Py\Xy, selected by the coalition may depend on Z, we indicate this dependency 
explicity by representing the channel as Py\x^z aiid the set of feasible channels as 

#x = {PY\Xy,z ■■ PY\x^,z=z G y^Kiz), Vz G Z}. (7.9) 

The error probability of the decoder is not increased if a helper provides it with the joint type Z. 
The entropy of Z is at most log \Z\ < log(A'^ + 1). Fano's inequality (j7.5p becomes 

|A| TVi? = i/(MA|s,y) < /r(MA,z|s,y) 

= |A'|^iog(iv + i) + i/(MA|s,y,z) 

< I A-]^ log(iV + !) + ! + Pf\pY\x^z) ■ KNR + /(Ma; Y|S, V, Zj(7.10) 
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For the error probability Pf''^{PY\XKz) to vanish for each Py\XkZ ^ ^K, we need 

R < liminf min min — ; — -I(M&:Y\S,V, Z) 

~ Pr,XKze#kACKiV|A| 

< liminf min mm —I{Xj\;Y\Xi<^\p^, S,W, Z) 

PyiXKz6#A'ACK |A| 

< lim max min mm — I(Xfi,-,Y\Xy<^\ ft^, S,W, Z) 

= lim max max min 

N^oo Pz {pxnW\s(^-^Xf^w\s{PsMN),Di,z)}^f.z {PY\Xf^eP'K{z)}zez 

^I^K\A, S,W,Z = z) 

< lim max > Pz{z) max min 
Ar_,oo PZ ^ PXy^w\s(^-'S'x^w\s{PsMN),Dr,z) Py\x^&'^k{z) 

min ^J{X^■, Y\X^\f,, S,W,Z = z) 

= lim max max min min- — -I{Xf^;Y\Xu\f,, S^W, Z = z) 

N^oo zeZ Px^^w\S<^^x^,w\s{PsMN),Duz) PY\x^,<^^Kiz) ACK |A| ^ 

= lim max min min- — -I(Xfi,:Y\Xi<_\A, S,W) 

Px^,w\s<^-'^x^^w\siPsMN),Di) PY\x^,&^KiPx^) ACK |A| ^ 

< lim max min min -——I{X/^;Y\X[<^\/:^, S,W). ('''•ll) 

L^'^ PX^^W\S'^i^X^^W\s(PS,L,Di) PY\x^^'^^K{PXt^) ^'^'^ |A| 

The resulting upper bound on R is the same as in (j7.8p . with '^k{pXk) ™ place of #x- 

Step 5. Fair Collusion Channels. If the collusion channel is fair, then applying Property 
(j3.3p . we obtain 

I^/(Xk; Y\S, W) < ^^I{Xa; Y\X^,\;,, S, W), VA C K, 
and thus C^^\Di,W^'''') = C^'^'iDi^W^''). □ 

8 Proof of Converse for Theorem 13.41 (Detect-One) 

By Lemma 13.21 it suffices to prove the claim for the compound class of memoryless channels 
Wk{p-x.^)- Let A^AT = {1,2,- •• ,2^^}. For notational simplicity, assume two colluders {K = 2). 
The proof extends straightforwardly to larger coalitions. For the detect-one criterion, it is sufficient 
to consider decoding rules that return exactly one user index, i.e., the decoding rule is a mapping 
On '■ X X Vat — > M-n- Denote by I?i(s, f) the decoding region for user i, i.e., 

The decoding regions form a partition of . 



29 



Without loss of optimality assume that randomly permuted codes (Def. 12. 3p are used, in which 
maximum and average error probabilities over € -M-j^ are equal (Prop. 12. ip . The proof is 

organized along ten steps. As in the previous section, we initially assume that the feasible set Wk 
is independent of • Then an arbitrarily small parameter (5 > is chosen. Step 1 introduces the 
basic random variables used in the proof. Step 2 defines for each (s, v) a set of bad codewords that 
have exponentially many neighbors within Hamming balls of radius N5 centered at these codewords. 
The remaining codewords constitute the so-called good set. Step 3 introduces a dense, nested family 
'^xs °f subsets of indexed by S and consisting of "nice channels". An equivalence is given 

between Hamming distance of two codewords and statistical distinguishability of the output of any 
Py\XiX2 S ^ks- Step 4 does (a) define a reference product conditional p.m.f for Y given S, V; 
(b) associate a conditional self-information to each pair of codewords; and (c) define a large set (of 
size is slightly lower than 2"^^^) of codeword pairs whose conditional self-information is within 6 of 
their average value. Step 5 defines a typical set for Y given S, V, and IC. Step 6 shows that typical 
sets for good codeword pairs have weak overlap. Step 7 defines a typical set for the host sequence 
S. Step 8 upper bounds the probability of correct decoding 

yiJeMN ■■ Pc{fN,gN,PY\XiX2) 
= Pr[gN{Y,S,V) 

sG5^ veVN yeVi{s,v)UVj{s,v) 

as the maximum of two bounds, one corresponding to a code with many good codewords and 
the other one to a code with many bad codewords. The upper bound is given in terms of a 
mutual information. Step 9 derives an upper bound on this mutual information and shows that 
any achievable rate R must be less than half of the upper bound. The proof is completed by letting 
6 10. The restriction that Wk is independent of is relaxed in Step 10. 

Step 1 . Define Qt = {V,Sj,j / t} over the alphabet = Vn X- S^'\ We have {St,Qt) = 
{S,V) for each 1 < t < A^. Since the host sequence type ps together with any gt,! < t < A, 
uniquely determines st and thus the pair (s,f) (and vice- versa), we may also use {ps,q) as an 
equivalent representation of the pair (s, f). Define a time-sharing random variable T uniformly 
distributed over {1, 2, • • • , A} and let 

S = St, Q = Qt, Y = Yt, and X^ = Xi,T{S, V),yieMN- 

Define the random variable X drawn uniformly from {Xj, i £ Mn}. Its conditional p.m.f given 
S, F, T is given by 

Px\svT(.x\s,v,t) = 2-^^ l{xit{s,v) = x}, yx,s,v,t. (8.2) 

For each pair € -^at; the joint distribution of {S,V,T,Xi,Xj,Y) is 

Ps PvPtPx,Xj\svtPy\XiX2 where 

Pxa,\SVT{xi,X2\s,V,t) = l{Xitis,v) = Xi, Xjt{s,v) = X2}, 

xi,X2 e X, ij e Mn, I <t < N. (8.3) 
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By (|8.2p . the average of ()8.3p over i,j is the product conditional p.m.f 

2-2NR px^x,\SVT{xi,X2\s,V,t) =Px\SVT(.Xl\s,V,t)px\SVT(.X2\s,V,t). (8.4) 

Step 2. Denote by 

N 
t=l 

the Hamming distance between two sequences x and x' in ;f ^, and by 

Mj{s,v,6) = {k e Mn ■■ dHi^jis,v),^k{s,v)) < N6}, 

j £Mn,s£S^,v£Vn,0<S<1 (8.5) 

the set of indices k for the codewords Xk{s,v) that are within Hamming distance N6 of codeword 
Xj(s, v), and by Mj (s, v, 6) = |A^j(s, v, 6)\ the cardinahty of this set. The function Mj{s, v,-) — 1 is 
akin to a cumulative distance distribution. It is nondecreasing, with M(s, f , 0) > 1 and M(s, v, 1) = 
2^R_ Note that for S = and random codes over X = {0, 1}, Mj{V, 5) — 1 is a random variable 
whose expectation vanishes as — > oo for 5 < dcviR): the Gilbert- Varshamov distance at rate 
R [22]. 

Denote by 

= {j gMn : \Mj{s,v,6)\>2''^^^} (8.6) 

a set of "bad" indices j (there are more than 2^^^ codewords within Hamming distance N6 of 
codeword Xj{s,v)), and by 

M^°°'^{s,v,6) = {j £Mn ■■ \Mj{s,v,6)\<2^'^'^}, veVN,0<6<l (8.7) 

the complementary set of "good" indices. 

Note that any code with normalized minimum distance ^min > satifies Mj{s,v,5) = 1 and 
thus TWf °'^(s, v,6)=Mn for all < 5 < ^min- 

Step 3 . Channels Py\XiX2 t^iat satisfy PY\XiX2iy\xi,X2) = or PriXiXiivlxi, X2) = 
PY\XiX2iy\xi,X2) for some y,xi,{x2 / x'2) require special handling. To this end, we define the 
following nested family of subsets of W^^^, indexed by < S < l/\y\- 



^fair 



{py\x^X2 e ^t" ■■ PY\X^X2iy\xi,X2) > S, 



6 < 



, PY\X:,X2iy\xi,X2) 



PY\XiX2{y\Xl,X2) 

By continuity of the mutual information functional and the definition (j8.8p . we have 

C{y^Ks) i C{Wk^') as 5 i 0. 
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For any quadruple (xi,X2,X2,y) and any Py\XiX2 ^ ^k^Si have 



Hence, for each s,v,i,j,y, we have 
dH{^j{s,v),:sik{s,v)) < N6 

and 



1 , Py|XiX2(yl^»(S'^)'^j(S'^)) 



Py|XiX2(y|xi'X2) 



(8.9) 



1 Py|XiX2(yl^i(^'^)'^i(^''^)) 



(8.10) 



(iH(xj(s,w),Xfc(s,t;)) < N5. 



i.ll) 



When the above normahzed loghkehhood ratio is small, we say that the codewords Xj{s,v) and 
Xfc(s,u) are nearly indistinguishable at the channel output. For any Py\XiX2 ^ ^kSi ^S.lOh and 
()8.1ip describe an equivalence between statistical distinguishability of two codewords and Hamming 
distance. 

Step 4. The conditional distribution of each Yt, 1 < t < N , given (S, V), is 

PYt\Sv{y\s,v) = PY\SVT{y\s,V,t) 

= 2"^^^ PX,Xj\SVT{xi,X2\s,V,t)pY\XiX2iy\xi^X2) 

= Y PX\SVT(.Xl\s,V,t)px\SVT{x2\s,V,t)pY\XiX2{y\xi^X2)- (8.12) 
xi,X2£X 

The product conditional distribution 



TV 



(y|s,^') = JJpYt|sy(2/t|s,w). 



^.13) 



t=i 



will be used as a reference conditional p.m./ for Y given S,V in the sequel. We also define the 
following conditional self-informations (i.e., mutual information for coalition (i,j) averaged over Yt 
(resp. Y) and conditioned on S,F): 



!>) 



V- f I r ^ f ^^^ PY\x,X2iyt\xit{s,v),Xjt{s,v)) 

V PY\XiX2{yt\xit{s,v),Xjt(s,v))\og , (8.14) 

y'^y PY^\Sv{yt\s,v) 
1 ^ 

^I^%,t(s,^^) 

t=l 

l'^'^^;,^ . ^, PY\XiX2iy\Xl^X2) 

77 V V Hxit{s,v) = Xi, Xjt{s,v) = X2}pY\XiX2{y\^l,^2)log — 

^ ^ PYt\sv{yhv) 



N 



t=l xi,X2,y 



\^ r,\ r \ ,\ f \ ^^ PY\XiX2[y\xi,X2) 

PT\t) PX,XA'&VT\Xx,X2\^, V, t)PY\XiX2{y\xi,X2) log — - 

t,xt^2,y PY\SVT{y\s,V,t) 



^.15) 
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Since Py\XiX2 symmetric, the expressions (j8.14p and (j8.15p are symmetric in i and j. The average 
of 9ij{s,v) over all € ■M'n is the conditional mutual information 



I{s,v) ^ 2"2^^ Yl %(^'^) (8.16) 
W,svTP-|x,x,(^i^2;r|S = s,V = v,T). (8.17) 



Since the average value of 6ij{s,v) is I{s,v), there may not be too many pairs for which 

9ij{s,v) is well above the mean. More precisely, there exists a symmetric subset A{s,v,6) C Aij^ 
of size 

|-4(s, V, 6)\ > 2^^^ > ^„ f 2^^^ 

6^ + I{s,v) ^2 _^ log 13^1 

such that 

£ A{s,v,6) eij{s,v) < I{s,v) +6^. (8.18) 

This claim is seen to hold by contrapositive. If there existed a subset ^^(s, v, 6) of size s^^_ll^i'(s v) 2^^^ 
or larger such that 

V(i , j) G ^^(s, v,6) : 9ij (s, v) > I{s, v) + 5^ 

we would have 

v) > (/(s, v) + 6^) \A'{s, v,S)\> 22^^/(s, v) 

which would contradict (j8.16p . 

Moreover, the interval [0, log |3^|] is covered by the finite collection of intervals 



2 ' ' '2 
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- / 



of width (52/2, and at least one of these intervals must contain many 9ij{s, v). Specifically, for some 
integer < / < /max there must exist a subset A{s,v,6) C A{s,v,6) with the following properties: 

£ A{s,v,6) 9ij{s,v)£ei 

\9^j{s,v)-l{s,v)\<j, (8.19) 

/(s, v) = (^l+'^y-^< I(s, v) < log \y\ , (8.20) 
and ^(s, V, 6) is symmetric with size at least equal to 

To summarize, the subset .4(s, v, 6) C Aif^ has size nearly equal to 22^^ and consists of the 
indices of the codeword pairs whose conditional self-information 9ij(s,v) is close to some I_{s,v) < 
I{s,v). 
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Recalling (j8.17p and the equivalence of the representations (S, V) and (5, Q), we define 



lip's, Q) = IvTv'sv\,sQTVnx,x, (^1' ^2; Y\S, Q = q,T) Vp's ^^S.q^ Qn 

which is a linear functional olp'g and coincides with I{s,v) in (j8.17p when p'g = ps- 
Step 5. Define the following subset of 3^^: 



^.22) 



Ts{s,v,i,j) ^ lyey 



N 



1 ^ 



t=i 



, PY\XiX2iyt\Xit{s, v),Xjt{s, v)) 

PYt\sviyt\s^v) 



< -J (8.23) 



which satisfies the symmetry property Ts{s,v,i, j) = Ts{s,v, 

We show that Ts{s,v,i, j) is a typical set for Y conditioned on S = s, V = t;, and fC = {i,j}, 
in the following sense: 



leioe;^ 5 

Pr[Y ^Ts{s,v,i,j)\S=s,V = v,IC = {i,j}] < , Vs,?;,i,j 



i.24) 



vanishes as N ^ 00. Indeed we may rewrite (|8.23p as 

Pr[Y ^ Ts{s,v,i,j)\S = s,V = v,}C = {i,j}] 
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Pr 



\9ijis,v) - 6ij{s,v)\ < 



S = s,V = v,IC = {i,j} 



where 



1 ^ 



t=i 



. Py\ XiX2 {yt\xit{s, v), Xjtjs, v)) 
PYt\sv{Yt\s,v) 



i.25) 



(8.26) 



Since Yj,! < t < N, are conditionally independent given S,V,IC, 6ij{s,v) is the average of N 
random variables that are conditionally independent given S = s,V = v,IC = Recalling 
(j8.14p . the conditional expectation of these random variables is given by 



YtlSVK. 



log 



1x1X2 {Yt\xit{s, v), Xjtjs, v) 
PYt\sv{Yt\s,v) 



•As,v), l<t<N, 



i.27) 



and summing (j8.27p over t yields EY|sy/c(%(S) ^)) = (^ij{s,v). The conditional variances of these 
random variables are 



Ct{s,v,i,j) = vavYtisvfC 



log 



PY\XiX2{Yt\Xitis, v), Xjtjs, v) 

PYt\sv{Yt\s,v) 



l<t< N. 



By our assumption (18. Sp that PY\XiX2iy\xi, X2) > 6 for every y,xi,X2, the argument of the log 
above is in the range [1/5,(5]. Hence (t{s,v,i,j) < log^ 5, and 

1 ^ lo ^5 



t=i 
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By Chebyshev's inequality, the probability of (|8.25p is upper-bounded by 
^Y\svK[0ijis,v) - _ varY|syyc(^u(s,^)) ^ 16 log^ 6 

Combining this inequality with (18.250 establishes (I8.24p . By averaging over S,V,IC, it follows from 
i^K2^ that 

Pr[Y^Ts{S,V,lC)]<^-^^. (8.28) 
Step 6. Define the following sets: 

Ais,v,i,6) ^ {j£MN ■■ {i,j)£Ais,v,6)}, (8.29) 
Mi''^°°'^{s,v,i,S) ^ Mf^°'^{s,v,6)nA{s,v,i,6). (8.30) 

So j G A^'^~^°°'^(s, f , i, (5) implies that G A{s,v,6), and j is a "good" index in the sense of 



We show that the typical sets Ts{s,v,i,j), j E Ai'^ ^'^°'^{s,v,i,6), have weak overlap for any 
fixed s, V, i. Define the overlap factor of the good sets 

Msiy,s,v,i)^ l{y£Ts{s,v,i,j)}. (8.31) 

If y £ '^^(s, V, n Ts{s, V, i, k) for some j, k G J^"^ ^°°'^(s, v, i, 5), then 



1 Py|XiX2(yl^i(S''^)'^i(S'^)) 



N 



log 



(«) 52 

< 2 X — + \eij{s,v) - eik{s,v) 

< 6^ 



where inequality (a) follows from (|8.23p and (b) from (j8.19p and the fact that both and {i,k) 
are in A{s,v,6). By (|8.1ip . this implies -^dH(xj(s, I?), Xfc(s, u)) < 6. Hence 

l{y G 'T's{s, V, i, j) n Ts{s, V, i, k)} 
< l{dH{^j{s,v),^k{s,v)) < N6} l{y G Ts{s,v,i,j)}, yj,ke Mi-^°°''{s,v,i, 5). 

Summing over k G M-^~^°°'^{s,v,i,6) yields 

^ (a) 

}2 l{y €Ts{s,v,i,j)nTs{s,v,i,k)} < \Mj{s,v,6)\l{y € Ts{s,v,i,j)} 

< 2^'^H{yGTsis,v,i,j)} 
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where inequality (a) results from (jS.Sp . and (b) from (j8.7p and the fact that j E A4^°'^{s,v,5). 
Squaring both sides of ()8.3ip and applying the inequality above, we obtain 



l{y G Ts{s,v,i,j)}l{y G Ts{s,v,i,k)} 



< 2 



N2VS 



E 



l{y G rsis,v,i,j)} 



= 2^^^' Ms{y,s,v,i). 

Hence the overlap factor of the good sets is upper-bounded as 

M5(y,s,^;,i) <2^2^^5_ 

Step 7. Define the following typical set for S: 

7;5 = {s : dv{Ps,Ps) < ^} 



i.32) 



i.33) 



where dvip,p') = Yls \Pi^) ~ P'i^)\ denotes the variational distance between two p.m.f's p and p'. 
We have the inequality 



Pr[S ^ Ts] 



Ts : dv{ps,Ps)>S 

^ 2-NDip4ps) 
Ps : dv{ps,Ps)>S 

< (A^ + l)!-^! max 2" 

Ps : dvips,Ps)>S 



(a) 
< 



(c) 

< (A^ + l)!-^! max 

Ps : D(ps|bs)>5Vln4 



NDip^Wps) 
2-NDip4ps) 



< 



^.34) 



where in (a) we have used the upper bound of [11, p. 32] on the probability of a type class, in (b) 
the fact that the number of type classes Tg is at most (N + l)'"^' [11] and in (c) Pinsker's inequality 
D{p\\q)>dl{p,q)/ In 4 [11, p. 58]. 

Applying successively (|8.22p and (j8.33p . we have 



\I{ps,q) - iips,q)\ 



Y^iPsis) - Ps{s)) I{Xi,X2; Y\S = s,Q = q,T) 

< Sma^I{Xi,X2;Y\S = s,Q = q,T) 

< <^ log 13^1, yseTs,qeQN. 



i.35) 



Step 8. Given /n, gN,PY\Xi,X2' '^^ ^^^^ interested in several conditional probabilities that 
correct decoding occurs in conjunction with the typical events Y G Ts{S, V,IC) and S £ Ts. Define 
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the following short hands: 

P_c{i,j\s, v) = Pr [correct decoding and Y £ Ts{s, v, i, j)\IC = {i, j}, S = s,V = v] 

yeTs {s,v,i,j)n{Vi {s,v)UVj {s,v)) 

E^i^ii) = Pr [correct decoding and Y G T^{S^ V^hj) and S E 7^ | /C = 

Since randomly permuted codes are used, both P^{i,j) and P_c{i,3\Ps) (which are error probabilities 
averaged over V) are the same for all i,j £ Aijy. The probability of correct decoding satisfies 

yijGMN ■ PcifN,gN,PY\XiX2) = -Pr-[correct decoding I /C = 

< Pr[S i U + Pr[Y i T5(S, V,K)\ +P,{i,j). (8.36) 



The first and second terms in the right side are upper-bounded by (j8.34p and (|8.28p . respectively. 

'at- 



The third term can be upper bounded as follows for any B C 7W?r: 



< maxmax -^Pg(5|s, f). (8.37) 

Define the following subsets of A{s,v,5): 

^^^<^(s,t;,5) = £ A{s,v,5) : \Mi{s,v,5)\,\Mj{s,v,5)\>2^'^'^^^ (8.38) 

= A{s,v,S)n{M^°'"^f. (8.39) 

Note that A^°'^'^{s,v,S) U A^^'^{s,v,S) is generally a strict subset of A{s,v,S) and that both 
A^'^'^{s,v,S) and ^^°°^(s, f , (5) are symmetric in Since |^(s, f,(5)| given in (I8.2ip is greater 

than a constant times 2^^^, at least one of the following inequalities must hold: 

|^g°°'^(s,i;,<5)| > 2^(2R-52) \A^^{s,v,5)\>2^^^^-^'l (8.40) 

We now apply (|8.37p twice, respectively using A^°"'^{s,v,5) and A^^'^{s,v,6) in place of B. 
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|^s°°<i(s, 


v,6)\ 


1 




|^g°°<i(s, 


v,6)\ 


2 




|^s°°<i(s, 


v,6)\ 



Case I : \A^°°'^{s,v,S)\ > 2^(2^-'^"). Choosing B = A^°'"^{s,v,6) in (18371) . we have 

Yl Yl Py|XiX2(y|xi(s,t;),Xj(s,t;)) 

(j,j)e^*^°°'*(s.'".5) y'^Ts{s,v,i,j)n{v,{s,v)uVj{s,v)) 

^ X] Py|XiX2(y|xi(s,t^),Xj-(s,v)) 

< 2-^(2^-.^)+! ^ y: p^|;,,^,(y|x.(s,.),x,(s,.)) 

nJVB 

Cfe~) ^ 

y 2-^(2«-.^Hi^ ^ ^ p^l^^^^(y|x.(s,.),x,(s,.)) 

2iVJi 

(J 2-^(2^J-52-As,»^)-5'-5V4)+i ^ ^ ^ r(y|s,v) 

^ ^-Ni2B-iis,.y9sy,Hi Y Y E l{ye'r,(s,.,i,j)}r(y|s,.) 

i = l y<^Vi(s,v) 



r,NR 

^11 2-^(2i?-/(s,.)-95V4)+i ^ Y r(y|s,i;) 2^2v^ 



=1 

^ 2-^(2i?-/(s,i,)-352-2v^) (8_41) 

3(52 

where (a) holds because the decoding sets Pj(s, f) are disjoint, and because of the symme- 
try of py|XiX2i -^(s,'y,<5), and Ts{s,v,i, j); (b) holds because A>^°°'^{s,v,6) C {(i,j) : j £ 
M-^~^"°'^is,v,i,6)}; (c) follows from Km and (l833|) : and (d), (e), and (f) follow from KWi . 
IKTl\i . and (18321) . respectively 

Now maximizing ()8.4ip over s £ and G Vat amounts to maximizing /(s, v) in the exponent. 
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In view of the equivalence of the representations (s, v) and {ps, q), we have 

max max /(s, w) = max max I(ps,q) 

seTg iieViv ps-dv{ps,ps)<s q^Qn 

< max I {ps,q) + 5 \og\y\ 

where the inequahty follows from ()8.35p . Combining with (j8.4ip . we obtain 

max max — — -P,MS°°'^|s, < 2-^(2i?-max, /(ps,g)-<5iog |y|-352-2v^) 

(8.42) 

Case II : \A^'''^{s,v,6)\ > 2^(2«-'5'). Choosing B = A^'^{s,v,6) in (lOTll . we have 

" Ubad(s ^ E ;>y|M(y|xi(s,^;),x,(s,t;)) 

^ ^-Ni2R-S^) ^ ^ <l^^^^(y|x,(s,^),x,(s,^)). (8.43) 

where in the last line we have dropped the restriction y G Ts{s,v,i, j). On the other hand we have 

2NR 

1 = 2-2^^ 5^ 5^ Yl P^^|x,x.(y|xi(s,^),x,(s,t;)) 

i,k=l j=l y(^Vj{s,v) 
2NR 

2 = 2-2^« ^ p{^l^^^^(y|x,(s,^;),x,(s,^;)) 

i,j,k=l yGOj (s,i))Ul'j (s,i)) 

> 2-2^« E E E P^|x.x.(y|x.(s,.),x,(s,.)) 

> 2-^v/^2-^^ E E E P^^|x.x.(y|x.(s,.),x,(s,.)) 
= 2-^v^2-2^« Yl E pV,x.(y|x.(s,i;),x,(s,t;)) 

> 2-^v^2^2v^2-2^^ 5; p^l^^^^(y|x,(s,.),x,(s,.)) (8.44) 

(i,j)eA^'^'^{s,v,5) y£Vi{s,v)UVj{s,v) 

where (a) is obtained restricting the ranges of k, (b) follows from (j8.5p . (j8.10p . and the fact that 
k £ Mj{s,v,5), and (c) holds because \Mj{s,v,6)\ > 2^"^^ for all j £ M^'^{s,v,S). The double 
sums in ()8.43p and ()8.44p are identical, and thus 



\A^^'^{s,v,5)\ 



P^iA''^ \s,v) < 2-^^^"-" ^ (8.45) 
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Combining Cases I and II, we obtain from (j8.37p 

yi,j G Mn : Pc{i,j) < max{2-^(2il-max,/(ps,.)-5log|y|-35^-2v^)^ 2"^(^-^')} . 

Substituting into ()8.36p and using the bounds ()8.34p and (|8.28p . we obtain 

PeifN,9N,PylX.X.) < (AT + 1)1-1 2-^^Vln4+16k^ 

+ max|2"^(2^"'"''''''^(^'s,g)-5iog|y|-3<52-2v^)^ 2-N(^/S-5^)^ .(8.46) 

Step 9. We now bound maxg I{ps, q) in (j8.46p by a quantity that does not depend on A^. Since 
Wp5Pi,sQTPmiX2(^l'^2;i^|5,g,r) = PQ{q)I{ps,q), 



we have 



max/(p5,9) = ^max^/,^,,,,,||^^^,^|,^,^(Xi,X2;y|5,Q,r) 



- (8.47) 

where (a) holds because the maximization is over a larger domain {pqt is now unconstrained over 
Wat = Qn X {1, 2, • • • , A^}), and (b) is obtained by defining the random variable W = {Q, T) £ Wn- 
Moreover 

(a) 

max I{Xi,X2;Y\S,W) < sup max I{Xi, X2;Y\S,W) 

= lim max I{Xi, X2;Y\S,W) (8.48) 

where the alphabet for W in (a) and (b) is {1, 2, • • • , L}, and the supremum and the limit are equal 
because the supremand is nondecreasing in L. 

Combining ^Mh . (IS^Tni . and ([H^iS]) . we conclude that 

P:{fN,9N,^t!5)= min Pc{fN,gN,PY\X,X,) 

vanishes as — > 00 for all 5 G (0, 1/|3^|] and all sequences of codes {fNidN) of rate 
1 



R > , 



min lim max I{Xi,X2]Y\S,W) + 5\og\y\ + Zd"^ + 2^5 

PY\X-,X2^K"!6 L^^Pw^^f^w 
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Letting 5 | 0, we conclude that reliable decoding is possible only if 

R < min lim max -I{Xi, X2;Y\S,W) 

= lim min max -I{Xi,X2]Y\S,W) 

L^°°Py\x,Xo(^'^k" Pw<^-^w ^ 



= lim max min -I{Xi, X2;Y\S,W) 

where the second equality holds by application of the minimax theorem: the mutual information 
functional is linear (hence concave) in pw and convex in Py\XxX2-' domains of pw and 

Py\XxX2 convex. Since the above inequality holds for all feasible Px\sWi obtain 

R < lim max min -I{Xi,X2]Y\S,W) 

L^OO Px^X2W\S'^-^X-,X2W\siPS,L,Di) PY\X^X2'^^t"iPXiX2) ^ 

This concludes the proof in case W is independent of • 

Step 10. It remains to prove the claim in the case Wk depends on the joint type = Z £ 

Z = ^^x^}- The approach parallels that used under the detect-all criterion (last step of proof 
of Theorem 13. 3|) . A helper provides Z to the decoder. The decoding regions now take the form 
Pj(s,u,z), i £ M.N- We similarly define r(y\s,v, z), 6ij{s,v,z), I{s,v,z), and Ts{s,v, z,i, j), by 
extension of the definitions (|8.13p , (j8.15p , (|8.17p , and (|8.23p . The memoryless channel selected by 
the coalition is now denoted by Py\XiX2Z and belongs to the feasible set of (j7.9p . The correct 
decoding probability in (j8.46p is denoted by PcifN, 9n,Py\XiX2z)j and the mutual information in 
the right side takes the form 

maxl{ps,q)< lim max I{Xi, X2;Y\S,W, Z) 

q L^oo pyY&&'w 

We conclude that 

Pc{fN,gN,'^K'') = min - Pc{fN,gN,PY\XiX2z) 

PY\x^X2zeWt'' 

vanishes as ^ oo for all sequences of codes (fNidN) of rate 

R > lim max min -I{Xi, X2;Y\S,W, Z). 

L->COpw&3>w PY\X^X2Z<^'^t" ^ 

Proceeding similarly to the derivation of (jT.lip proves the claim. □ 

9 Proof of Theorem 14.11 

We derive the error exponents for the threshold decision rule (j4.ip . By symmetry of the codebook 
construction, the error probabilities will be independent of /C. Without loss of optimality, we 
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assume that /C = K = {1, 2, • • • ,K}. Recalling that W = {1, 2, • • • , L}, denote by =^^(L) the set 
of joint types over X x W. Define 

^Y^x^,\w(P^'"^'^K,R,L,m) = {pyxKlw : I^xkIw e -^(Pxlw), Py|xK e ^i^(PxK), -^(x.m; y|w) < i?} 
{R,L,P:^^,Wk) = min ^(PyxK|wl|Py|xK PxV bw) (9.1) 

E^sp,n{R,L,p^^,Wk) = max£'psp,m,,Af(^, -Z^,Pxw,^i^), (9.2) 

Er,sv n{R^L,p^^,Wk) = minSpsp,m,Ar(i?,L,pxw,^i^) (9.3) 

and 

Ep,p^m{R,L,Wk) = max ^psp,i,jv(ii, i,Pxw, #fjL). (9.4) 

Px w S ( i ) 

Denote by p*^ the maximizer above (which implicitly depends on R) and by T*^ the corresponding 
type class. Let 

Ep,p^n{R,L,Wk) = Ep,p,n{R,L,p*^^,Wk), (9.5) 
E^,^^ML,Wk) = E^,^^AR,L,p*^^,Wk). (9.6) 



The expressions (|9.ip — (j9.6p differ from (j4.3p — (j4.8p in that the optimizations are performed over 
types instead of general p.m.f.'s. We have 

lim Ep,p^n{R,L,Wk) = Ep,p{R,L,Wk) (9.7) 
Jim E^,^,ML,Wk) = E^,^{R,L,Wk) (9.8) 

by continuity of the divergence and mutual- information functionals. 

With the joint type class T*^ specified below ()9.4p . we now restate the coding and decoding 
scheme. 

Codebook. A random constant-composition code C(w) = {xm, 1 < m < 2^^} is generated 
for each w G by drawing 2^^ sequences independently and uniformly from the conditional type 
class r;,^. 

Encoder. A sequence w is drawn uniformly from and shared with the receiver. User m is 
assigned codeword Xm from C(w), for 1 < m < 2^^. 

Decoder. Given (y,w), the decoder places user m on the guilty list if /(xm;y|w) > i?+ A. 

Collusion Channel. The random code described above is a RM code. By Prop. [221 it is suf- 
ficient to restrict our attention to strongly exchangeable collusion channels for the error probability 
analysis. Recall from (j2.15p and (j2.16p that for such channels, 

PY|XK(y|xK) = < p^l{Py|xK £ ^k(Pxk)}, VyGTyi,^. (9.9) 

Error Exponents. The derivation is based on the following two asymptotic equalities which 
are special cases of p0.12j) and (110. 16p proven later. 
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1) Fix w and y and draw x uniformly from a fixed conditional type class T*^^, independently of 
y. Then for any i' > 0, 

Pr[/(x; y|w) > i/] < 2-^"". (9.10) 

2) Fix w, draw x^, m G K, i.i.d. uniformly from a fixed conditional type class T^\v,, and then draw 
Y uniformly from the type class 7^y|xK- strongly exchangeable collusion channel, for any 
m G K and v >0, we have 

Pr[/(Xm; y|w) <l^]= exp2{-iV£^psp,m,Ar(l^,-f',Pxw,^A-)}- (9.11) 

(i) . False Positives. A false positive occurs if 

3m^K: /(x„; y|w) > i? + A. (9.12) 

By construction of the codebook, x^ is conditionally independent of y given w, for each m ^ K. 
There are at most 2^^ — K possible values for m in (I9.12p . Hence the probability of false positives, 
conditioned on the joint type class Tyx^w, is 

PFp(TyxKW,^K) = Pr[3m^K: I(x„; y|w) > + A] 

(a) 

< (2^^-i^)Prx[/(x;y|w) > i? + A] 
(b) 

where (a) follows from the union bound, and (b) from ()9.10p with v = R + A. Averaging over all 

type classes Tyx^w) we obtain Ppp < 2"^^, from which (gJD follows. 

(ii) . Detect-One Error Criterion. (Miss all colluders.) We first derive the error exponent 
for the event that the decoder misses a specific colluder m G K. Any coalition fC that contains m 
fails the test (14. ip . i.e., for any such fC, 

/(x„;y|w) <P + A. (9.14) 

The probability of the miss-m event, given the joint type Pxw) is therefore upper-bounded by the 
probability of the event (j9.14p . From (j9.1ip we obtain 

Pmiss— m (Pxw,^i^) < Pr[/(x„;y|w) <P + A] 

(a) 

< exp2{-A^Sp.p,™,^(P + A,L,p*^,#K)}. (9.15) 
The miss-all event is the intersection of the miss-m events over m G K. Its probability is 



Pmiss-all(Pxwi ^x) = Pr 



n {miss m \ p*^} 



,meK 



^ mill Pmiss— m 
(a) 



(b) 
(c) 



min exp2{-A^£;psp,m,Ar(Pxw) P + A, L, Wk)} 



exp2{-NEpsp,N{R + A, L, I^k)} 
exp2{-iV:Ep,p(P + A, L, Wk)} 
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were (a) follows from (jO.lSp . (b) from (j9.2p and (j9.5p . and (c) from (j9.7p . 

(iii). Detect- All Error Criterion. (Miss Some Colluders.) The miss-some event is the union 
of the miss-m events over m G K. Its probability is 



Pmiss— some(j'xw' ^^k) — 



IJ {miss m | p*^} 

,m£K 
meK 

= maxexp2{-A^-E'psp,m,Af(^ + A,L,p*^,#^)} 
exp2{-iVSp,p,^(i? + A,L,5rx)} 

(fe) 

= exp2{-iVSp,p(/? + A,L,#K)} 



where (a) follows from (j9.3p and (j9.6p . and (b) from (|9.8p . 

(iv). Fair Collusion Channels. Recall (|4.2p . restated here for convenience: 

^YX^\w{PXW,'^K,R,L,m) = {pyX^\W ■ PX^\W ^ ^{PX\w), PY\X^ e '^K{PXy<), 

hyx^,wPw{Xm;Y\W)<R], meK. 

The union of these sets over m, 

^YXy^\w{Pxw-,^K-,R-,L,'m) (9.16) 



meK 



is convex and permutation- invariant because so is Wk, by assumption. Combining (j9.16p . (j4.2p . 
and (I4.3p . we may write (14. 4|) as 

^psp(i?,L,pxty,^i^) = min D{pYXy^\w\\PY\Xy,Px\w\Pw)- (9-17) 

For any Pyx^\w S V*{Wk) and permutation vr of K, define the permuted conditional p.m.f. 

Pyx^^\w^V^ ^k|w^) = PYXy^\w{y, a;^(K) 1^^) 

and the permutation-averaged p.m.f. Py^ \w ~ W\ Y^-kPyx \w 'which also belongs to the convex 
set r*iWK). We similarly define p^,^^ and ppf^^. Observe that D{p-y^^^^\\p-y^^^p^^^ \pw) is 
independent of vr. By convexity of Kullback-Leibler divergence, this implies 

DiPp%,lw\\PY\x.Px\w\Pw) < ;^E^(^?-XK|Tyll^5y|XKPf|H^lf^^) 

TT 

= DiPYXK\w\\PY\x^Px\w\Pw)- (9.18) 

Therefore the minimum in (j9.17p is achieved by a permutation-invariant PYXt^\w — Py^Xk\w^ ^^'^ 
the same minimum would have been obtained if Wk had been replaced with W^^^ . Hence 

E^^^{R,L,pxw,y^K) = Ep,p{R,L,pxw,^K")- 
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Substituting into (|4.7p and (|4.10p . we obtain 

E^'^^R, L, Wk, A) = L, A). 

(v). The equality 



E'^-^^R, L, A) = ^^"(i?, L, .r^f ^^ A) 

is straightforward because E-^^^^rniR^i^^Vxw -."^k^^) ™ (|4.3p is the same for all m € K, and thus 

(vi). Positive Error Exponents. From Part (v) above, we may restrict our attention to 
Wk = W^^^ . Consider any W = {1, • • • ,L} and pw that is positive over its support set (if it is 
not, reduce the value of L accordingly.) For any m G K, the minimand in the expression ()4.3p for 
E^sp,m{R, L,pxw is zero if and only if 

Pyx^\w = Py\x^ Px\w^ '^ith pyix^ € ^^'^fc)- 

Such PyXk\w is feasible for (I4.2p if and only if {pxw ,Py\Xk) is such that I{Xm;Y\W) < R. It is 
not feasible, and thus a positive exponent E°^^^ is guaranteed, if < I{Xi;Y\W). The supremum 
of all such R is given by (j4.12p and is achieved by letting A — > and L — > oo. □ 

10 Proof of Theorem [531 

We derive the error exponents for the MPMI decision rule (15. 7p . Again by symmetry of the codebook 
construction, the error probabilities will be independent of /C. Without loss of optimality, we assume 
that /C = K = {1, 2, • • • ,K}. We have also defined W = {1, 2, • • • , L}. Define for ah A C K 

^Y^Xt<,\SW^P^^ Ps|w> P^\s^v,^K,R,L,A) = {pyxKlsw : PxkIsw G -^(PxIsw), Py|xK ^ '^k{P^J , 

/(xA;yxK\A|sw) < |A|i?| (10.1) 

£^psp,A,Af(^'^'Pw,Ps|w,Px|sw,^x) = min 

Pyx I s w G .i^'^y I g (pw , Ps I W I Px I S W I '^if I i I A) 

^(PyxKlswbyl xk ^'xIsw I ^'sw ), (10.2) 

) Pw ) Ps| W ) Px| SW 1 Wk) = D{ps\^\\ps\p^/v) + Epsp,A,N{R,L 

= min 

Ps\Pw), (10.3) 

Epsp,NiR,L,p^,Ps\^,p^lsw^'^K) = Epsp,K,NiR,L,p^,Ps\^,p^\s^,WK), (10.4) 

^psp,Af (-^' Pw) Ps|w) Px|swi #k), (10.5) 

AC K 

) Pwi Ps|w; Px|sw) 
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where the second equahty in (jl0.3p is obtained by apphcation of the chain rule for divergence. 

Denote by and the maximizers in (|10.6|) . the latter viewed as a function of Ps\vr. 

Moreover, both and p*|s^ implicitly depend on R and ^K^^n^- Denote by and the 
corresponding type and conditional type classes. Let 

Epsp,n{R,L,Di,Wk) = min^psp,Ar(iZ,L,p;„Ps|^,p*|g^,#'i^) (10.7) 

Ps\\v 

E^,p^^{R,L,Di,Wk) = mmEp,^^^{R,L,p*^,p,\^,p*^^^^,WK). (10.8) 

The exponents (jl0.3p — (jlO.Sp differ from ()5.1ip — (j5.16p in that the optimizations are performed 
over conditional types instead of general conditional p.m.f.'s. We have 

lim Ep,p^n{R,L,Di,Wk) = Epsp{R,L,Di,WK) (10.9) 
hm Ep,p^ML,Di,WK) = Ep,p{R,L,Di,WK) (10.10) 

by continuity of the divergence and mutual- information functionals. 

Codebook. For each w G and s G , a codebook C{s,w) = {x^, 1 < m < 2^^^} is 
generated by drawing random vectors independently and uniformly from ^^jg^- 

Encoder. A sequence w is drawn uniformly from and shared with the decoder. Given s 
and w, user m is assigned codeword G C(s, w). 

Decoder. The decoding rule is the MPMI rule of (|5.7p . 

Collusion Channel. This random code is a RM code, hence by application of Prop. 12.21 it is 
sufficient to restrict our attention to strongly exchangeable collusion channels. 

Error Probability Analysis. To analyze the error probability for our random-coding scheme 
under strongly exchangeable collusion channels, we will again use the bound (j9.9p as well as the 
following three properties, which originate from the basic inequalities (jl.ip and (jl.2p . 

1) Fix (s,w) and z G , and draw xk = {x^, m G K} i.i.d. uniformly from a conditional type 
class Tx|sw) independently of z. We have the asymptotic equality 

Pr[T:>^ |zsw] = ^'^''^'^^'^^ = 2-^[^'f^(''l^^)--f^(^K|zsw)] ^ 2-^^(''k;z|sw) (10.11) 

l^x|swl 

where the last equality is due to (j5.2p . Then 

Pr[/(xK;z|sw) > z^] = P 

T 

I zsw 

= 2-^^(^K'^I^^)i{/(xk;z|sw) > z.} 

T I 

I zsw 

= max 2-^^(''k;^I"^)i{/(xk;z|sw) > zy} 

-^xj^ I zsw 

< 2-^^ (10.12) 
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2) Fix w and draw s i.i.d. ps- We have [11] 

3) Fix (s,w), draw x^, A; G K, i.i.d. uniformly from a conditional type class T^isw) ^-iid then draw 
Y uniformly from a single conditional type class Ty\^^. We have 



_ l^y|xKswl I^xkI 



swj 

sw I 



IT I I IT I F 

Ky|xKl |-^x|swl 
^ 2"^['f^(y|''K)-^^(y|xKSw)] 2-A^[^^^(x|sw)-H(xk|sw)] 

= exp2 |-A^[/(y;sw|xK) +I(xi;--- ;xi^|sw)]l . (10.14) 



Consider the two terms in brackets above. The first one may be written as 

/(y;sw|xK) = T>(Pysw|xKll^'y| xk Psw|xk I -Pxk) 
= -D(pyswxK l|Py|xK Pswxk) 
= ^(PyxK|swl|Py| Xk Pxk|sw I Psvf) 

and the second one as 

o 

/(xi; • • • ;X;^|sw) = T'CPxkIswIIPxIsw bsw)- 

By application of the chain rule for divergence, the sum of these two terms is 
-C(PyxK|swl|Py|xKP^sw bsw)- Substituting into P0.14p, we obtain 

-P^-l^yXKlsw] = exp2 {-A^-D(pyx^|swby|xKPjsw bsw)} • (10.15) 

In the derivation below we use the shorthand e(pyx^|sw) to represent the exponential above, and 

fix Tx|sw = ^x|sw- 

For any feasible, strongly exchangeable collusion channel, for any A C K and > 0, conditioning 
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on w E and s G S'^ , we have 



Pr 
(a) 



< Pr[ry^^|,^]l{/(xA;yxK\A|sw) < |A|z.} 

feasible Ty^^isw 



(6) 



(J 



X] e(pyxKisw) l{^(xA;yxK\A|sw) < |A|z^} 

feasible Py^^ 

sw 

O 

max e(pyxK|sw) lU(xA;yxK\A|sw) < |A|z^} 

feasible Pyx^lsw 

o 

max e(pyxK|sw)l{-?'(xA;yxK\A|sw) < |A|z^} 

PyXK |sw : PxK IswS^ (P^lsw) ' I^K ^^^J^ 

max ^ e(pyx^isw) 

PyxKlsw : PxK|swe-'#(p*|g^),Py|xKe#'K,-f(xA;yXK\A|sw)<|A|i/ 

max e(pyx^isw) 

exp2 j-iV^^pspAAfl^'^'^'w'Pslw^PxIsw'^^)} (10.16) 



where (a) follows from (19. 9|) . (b) from (jlO.lSp . (c) from the fact that the number of conditional 
types is polynomial in N, (d) from (jlO.ip . and (e) from (jl0.2p . 

(i). False Positives. A false positive occurs if /C \ /C 7^ 0. By application of (15. Sp . we have 

V^C/C : /(x^;yx^^_4|sw) > + A). (10.17) 

Denote by B the set of colluder indices m G K that are correctly identified by the decoder, and 
hy A = JC\ B the complement set, which is comprised of all incorrectly accused users and has 
cardinality |^| > 1. By construction of the codebook, x_4 is independent of y and xg. The 
probability of the event ()10.17p is upper-bounded by the probability of the larger event 

3BCIC,3A: l(x^;yxB |sw) > |^|(i2+ A). (10.18) 
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Hence the probability of false positives, conditioned on Tyx^swi satisfies 
PppiTyxics-w, '^k) 
= Pr 

(a) 



U Ul^"^- Ax^;yxB|sw) > |^|(i? + A) 

BCK \A\>1 



BCK. \A\>1 
^ ^ ^ 2N\A\R2-N\A\{R+A) 
BQK. \A\>1 

BCK. \A\>1 



/(x^;yxB|sw) > \ A\iR + A) 



(10.19) 



where (a) follows from the union bound, and (b) from (jl0.12p with yx(mg) in place of z. Averaging 

over all joint type classes Tyx^sw, we obtain Ppp < 2~^^, from which (|5.17p follows. 

(ii). Detect- All Error Criterion. (Miss Some Colluders.) Under the detect-all error event, 
any coalition /C that contains IC fails the test. By (|5.8p . this implies that 



(10.20) 



3AQK: : /(x^;yx^^_4|sw) < |^|(i? + A). 
In particular, for /C = /C = K we have 

3ACK: /(xA;yxK\A|sw) < |A|(i? + A). 



(10.21) 



The probability of the miss-some event, conditioned on (s,w), is therefore upper bounded by the 
probability of the event (jl0.2ip : 

J'miss-somc(PwPs|w)Px|sw' '^k) 



< Pr 



ACK 



U J/(xA;yxK\A|sw) < |A|(i? + A) 



< 

(a) 

< 



ACK 



/(xA;yxK\A|sw) < |A|(i? + A) 



SW 



ACK 



maxexp2 |-A^-Epsp,A,Af(-R + A, L,p*^,Ps\^,p*^^^^, Wr 
exp2 <^ -N mmEpsp,A,N{R + \ L,p*^,p^\^,p*:,Wk] 

I AC K ' 



(10.22) 
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where (a) follows from (jl0.16p with u = R + A. 
Averaging over S, we obtain 

Pmiss— some ) 

wj Pmiss— some 

Ps|w 

(a) r 
= maxexp2 < — 



Ps|v 

= max 

Ps|w 



AC K ' 



exp2 |-A^psp^^(i? + A, L,IJw,]5s|w,J'x|sw' ^ii")} 



exp2{-A^p,p^^(i2 + A,L,i?i,^;^)} 

(d) 

= exp2{-A^p,p(i? + A,L,L»i,#k)} 

which proves (jS.lSp . Here (a) follows from (jl0.13p and (jl0.22p . (b) from the definitions (jlO.Sp and 
(fTa3]) . (c) from (fTOS]) . and (d) from the limit property (llO.lOp . 

(iii). Detect-One Criterion. (Miss All Colluders.) Under the detect-one error event, either 
the estimated coalition IC is empty, or it is a set 2 of innocent users (disjoint with /C). Hence 
pone ^ p^j^ _ 0j _|_ p^[^ — j]^ 'pj^g f^pgi; probability, conditioned on (s,w), is bounded aslfl 

Pr[/C = 0] = Pr[V/C' : MPMI{K.') < 0] 

< Pr[MPMI{K,) < 0] 

= Pr[/(x/c;y|sw) < /^(i2 + A)] (10.23) 

= exp2 ^-NEpsp,K,N{R + ^, L,pI„Ps\^,p*^^^^,Wk)^ . 

To bound the second probability, we use property ()5.9p with K. =Z and ^ = /C. We obtain 

/(x,c;yxj|sw) < K{R + A) 

Since 

o o o 

/(x/c;yxi|sw) = /(x/c;y|sw) + /(x/c;xj|ysvir) > /(x/c;y|svir) 
combining the two inequalities above yields 

/(x/c;y|sw) < K{R + A). 
The probability of this event is again given by (jl0.23p : we conclude that 

Pmiss— all 

(PwPslw^PxIsw^^^) = exp2 \^-NEpsp,K,NiR + A,L,pI„PsI^,p^^ 

sw ' 



Using the bound minx.'CK Pr[MPMI{IC') < 0] would not strengthen the inequality in (|10.23p . 
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Averaging over S and proceeding as in Part (ii) above, we obtain 

wj Pmiss— all 

Ps| W 

= exp2{-NEp,p{R + A,L,Du\<,Wk)} 

which estabhshes (I5.19p . 

(iv) . Fair Collusion Channels. The proof parallels that of Theorem 14.11 Part (iv). Define 

^*{Wk) = ^YX^\swiPw,Ps\w,Px\sw,^K,R,L,K) (10.24) 
which is convex and permutation-invariant. Then write (|5.12p as 

Epsp{R,L,pw,Ps\w,Px\sw,^K) = min D{pYXi,\sw\\PY\Xi,Px\sw\Ps\wPw)iW.25) 

PyxKlswe^*(^if) 

For any pyx^lSW ^ ^*{^k) and permutation vr of K, define the permuted conditional p.m.f. 
Pyx^^ISW the permutation-averaged p.m.f. Pyx^^isw = FT I^ttPyXkISW which also belongs 
to the convex set S^*{Wk)- We similarly define Py\Xk ^^'^ ^5^I^k' "^^^ conditional divergence 
D{Pyx. \sw\\Py\XkPx\sw\ Ps\wPw) is independent of tt. By convexity, we obtain 

D{PYx^\sw\\PYfx^Px\sw\Ps\wPw) < D{pYXi<,\sw\\PY\x^,Px\sw\Ps\wPw)- (10.26) 

Therefore the minimum in (110.251) is achieved by a permutation- invariant PYXi<,\sw = PyXh\sw^ 
and the same minimum would have been obtained if Wk had been replaced with W^^^ . Hence 

Ep^p{R,L,pw,Ps\w^Px\sw^^K) = Ep^p{R,L,pw ,Ps\w ^Px\sw ^^k''')- 
Substituting into ()5.15p and (j5.19p . we obtain 

E^^'^R, L, Di,Wk, A) = E°''%R, L, Di, A). 

(v) . Detect- All Error Exponent for Fair Collusion Channels. Using (|5.10p and (|5.1ip . 

observe that ^psp in (j5.13p may be written as 

Kpsp{R,L,pw,Ps\w,Px\sw,'^K) = min_ D{pyx^\sw Ps\w\\Py\x^^Px\sw Ps \Pw()W.27) 

PYX^\SW^-'^ 

where 

S^*{Wk) = {pyx^\sw ■ Px^\sw ^ -^{Px\sw), Py\x^^^k{px^,), 

mini^l(XA;yXK\A|W) < i?} • 

Similarly to the discussion below (jl0.25p . when Wk = W^^"^ the minimum over PYXy(,\sw in (|10.27p 
is achieved by a permutation-invariant conditional p.m.f. 
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o 

Next we show that K minimizes ■^I{Xj:i,;YXy^\j:i^\SW) over A C K. Indeed 



^/(XA;yXK\A|W) = ^ 



HiXm\SW) + H{YXi^\j^\SW) - H{YXk\SW) 

.mSA 



= H{X\SW) - ^^H{Xj,\YX^,\^SW) 

(a) 1 

> H{X\SW) - —H{Xk\YSW) 

= -^I{X^,;Y\SW) (10.28) 



where (a) follows from with Z = {Y, S, W). 

Using and (fTOMl) . we obtain 'W* {W^""") = Hence 



EpspiR,L,pw,Ps\w,Px\sw,^K'') = , min ^^.^ D{pYXt,\sw Ps\w\\Py\x^Px\sw Ps \Pw] 



= Epsp{R,L,pw,Ps\W^PX\SW,^K''') 

and therefore 

E^^\R, L, Di,W^\ A) = L, Di, ^-^'^ A). 

(vi). Positive Error Exponents. Consider any W = {1, • • • ,L} and pw that is positive over 
its support set (if it is not, reduce the value of L accordingly.) For any A C K, the divergence to 
be minimized in the expression (|5.1ip for Epsp,^{R■,L,py/■,Ps\W1Px\sW1^K) is zero if and only if 

Pyx^\SW = Py\Xk Px\sw and ps\w = Ps- 

These p.m.f.'s are feasible for (jS.lOp if and only if the resulting I{Xfi,■,YX^<^\^fi^\SW) < |A| R. They 
are infeasible, and thus positive error exponents are guaranteed, if 

R<mm^^I{XA;YX^,\^\SW). 

From Part (iv) above, we may restrict our attention to Wk = under the detect-one 

criterion. Since the p.m.f. of (S*, VF, Xk, 1") is permutation-invariant, by application of (|3.3|) we 
have 

mm^^I{XA;YXK\A\SW) = ^I{Xk;Y\SW). (10.29) 

Hence the supremum of all R for error exponents are positive is given by C°'^^{Di,Wk) in (j3.8p 
and is obtained by letting A ^ and L ^ oo. 

For any Wk, under the detect-all criterion, the supremum of all R for which error exponents are 
positive is given by C^^^{Di,Wk) in (j3.9p and is obtained by letting A ^ and L — > oo. Since the 
optimal p.m.f. is not necessarily permutation-invariant, ()10.29p does not hold in general. However, 
if Wk = W^^"^, the same capacity is obtained for the detect-one and detect-all problems. □ 
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11 Conclusion 



We have derived exact fingerprinting capacity formulas as opposed to bounds derived in recent 
papers [4,5,10], and constructed a universal fingerprinting scheme. A distinguishing feature of 
this new scheme is the use of an auxiliary "time-sharing" randomized sequence W. The analysis 
shows that optimal coalitions are fair and that capacity and random-coding exponents are the same 
whether the problem is formulated as catching one colluder or all of them. 

Our study also allows us to reexamine previous fingerprinting system designs from a new angle. 
First, randomization of the encoder via W is generally needed because the payoff function in the 
mutual-information game is nonconcave with respect to Px\s- Thus capacity is obtained as the 
value of a mutual- information game with Pxw\s ^ the maximizing variable. This has motivated 
the construction of our randomized fingerprinting scheme, which may also be thought of as a 
generalization of Tardos' design [9]. Two other randomization methods are also fundamental: 
randomized permutation of user indices to ensure that maximum error probability (over all possible 
coalitions) equals average error probability; and randomized permutation of the letters {1, 2, • • • , N} 
to cope with collusion channels with arbitrary memory. 

Second, single-user decoders are simple but suboptimal. Such decoders assign a score to each 
user based on his individual fingerprint and the received data, and declare guilty those users whose 
score exceeds some threshold. While this is a reasonable approach, performance can be improved 
by making joint decisions about the coalition. Similarly, the fingerprinting schemes proposed in [9] 
and in much of the signal processing literature might be improved by adopting a joint-decision 
principle, at the expense of increased decoding complexity. 

Acknowledgments. The author is very grateful to Dr. Ying Wang for reading several drafts of 
this paper and making comments and suggestions that have improved it. He also thanks Yen- Wei 
Huang, Prof. Barg, and the anonymous reviewers for helpful comments. 
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A Proof of Lemma 13.11 



Due to the permutation-invariant assumption on the joint p.m.f. of (Xk, Z), it suffices the estabhsh 
(I3l^ for A = {1, • • • , A; - 1} and B = {1, • • • , k}, where 2 < k < K. The claim then follows by 
induction over k. Let Zk = {Z,X^_^j^), hence Z^-i = {Zk,Xk). Then (13. ip takes the form 

^H{Xt'\Z,X^) < ^H{X^\Z,X^^,) 

or equivalently 

{k - l)H{X^\Zk) > kH{X^-^\ZkXk), 2<k<K. (A.l) 
And indeed the difference between left and right sides of (jA.ip satisfies 

{k - l)H{X^\Zk) - kH{Xt^\ZkXk) 

= {k- l)[H{Xk\Zk) + H{X^-'\ZkXk)] - kH{Xt'\ZkXk) 
= {k- l)H{Xk\Zk) - H{X'^-^\ZkXk) 

( ) 

= ''^ H{Xi\Zk) — H{X^ ^\ZkXk) 
1=1 

> H{Xt'\Zk) - H{Xt'\ZkXk) 
= I{X^ ^]Xk\Zk) 

> 

where (a) holds because the conditional p.m.f.'s Pxtiz^: 1 < i < are identical due to the permu- 
tation invariance assumption. Inequalities (b) and (c) hold with equality when Xi, 1 < i < k, are 
conditionally independent given Zk- 

Similarly, to establish (j3.2p . it suffices to prove that 

(A: - 1)H{X^\Z) < kH{X^-^\Z). (A.2) 



We have 



{k - l)H{X'l\Z) - kH{X'^-^\Z) 

= {k- l)[HiXt'\Z) + H{Xk\Z,Xt')] - kH{Xt'\Z) 
= {k-l)H{Xk\Z,Xt')-H{Xt'\Z) 



(a) 



k-1 



Y,H{X^\Z,Xl~\X^^,) - H{Xt'\Z) 
1=1 

k-l k-1 

Y,H{X,\ Z, X{- \XI^)-Y,H{X,\ Z, X\' ' ) 

i=l i=l 

k~l 

-Y,Kx^■,xl,\z,x\-^) 



i=l 

< 



where in (a) we have used the permutation invariance of the distribution of X^, and in (b) the 
chain rule for entropy. □ 
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B Proof of Lemma 13.21 



The derivation below is given in terms of the detect-one criterion but apphes straightforwardly to 
the detect-all criterion as well. Denote by C'm"moryiess(-^i' •^^) compound capacity under the 
detect-one criterion. To prove the claim 

it suffices to identify a family of collusion channels satisfying (|2.10p and for which reliable decoding 
is impossible at rates above C^^^^^yi^^^{Di,WK)- Consider the class 

^x(Pxk) = < Py|XK e -^yiXK : min, max Ipyix^lykK) - Py|XK(ykK)| < e > , e > 0, 

(B.2) 

which is slightly larger than '^xiPyix) but shrinks towards /^(Px^c) as e | 0. Continuity of mutual 
information with respect to variational distance implies that 

C^'^^^Di^W^)] C^'^^Di^Wk) asejO. (B.3) 

We now claim that if the coalition selects a memoryless channel from /^(px^)) the constraint 
Py|xK ^ ^kiP^K) is satisfied with probability approaching 1 as ^ oo: 

Ve > : pY\x^ e Wk{p^^) => Pr[Pyi^^ G ^i^(Px^)] > 1 - e ViV > iVo(e)- (B.4) 
To show this, define the set 

-K 



£=<:s.K ■■ min p^^^ixK) > 

Without loss of generality assume /at is such that 

Pr[xK e ^] > 1 - e/2 (B.5) 

where the probability is taken with respect to MK,S,y. For any xk € <S, xk G , y G 3^, if y 
is generated conditionally i.i.d. PY\Xy^-: the random variable Py\^^{y\x[C) converges in probability to 

PY\Xy^{y\x^) as, N ^ CO. Hence 



-fV|XK=XK 



> 1 - e/2, VxK G £ (B.6) 



max |py|xJy|xK) -PY|XK(ykK)| <e 

for any N > No{e). Combining (IB.Sjl and (jB.eh . we obtain (|B.4p . 

A lower bound on error probability is obtained when a helper provides some information to 
the decoder. Assume the constraint on the coalition is slightly relaxed so that they are allowed to 
produce pirated copies that violate the requirement Py|xK ^ ^kiP^x) with probability at most e, 
as in ()B.4p . but also that the helper reveals the entire coalition to the decoder in this event. This 
contributes at most eKNR bits of information to the decoder and does not increase the decoder's 
error probability. Hence 

^;^) + e < Crmorylcss(^l, ^i^)- 

Combining this inequality with ()B.3P establishes (IB.lh . □ 

^ One may always "fill in" each codeword Xm with 2e\X\^^ N dummy symbols drawn fi'om the uniform p.m.f. on 
X to ensure that (|B.5[) holds. The rate loss due to the "fill-in" symbols vanishes as e — > 0. 
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